Remix clone Hacker News

new | show | ask | jobs Github

	▲	abhikul0 9 hours ago
		I'll try to use that, but llama-server has mmap on by default and the model still takes up the size of the model in RAM, not sure what's going on.
	▲	zozbot234 9 hours ago \| parent [-]
		Try running CPU-only inference to troubleshoot that. GPU layers will likely just ignore mmap.