Remix clone Hacker News

new | show | ask | jobs Github

	▲	hoppp 7 days ago
		It probably loads the entire model into ram at once while ollama solves this and does not, it has a better loading strategy
	▲	blooalien 7 days ago \| parent [-]
		Yeah, if I remember correctly, Ollama loads models in "layers" and is capable of putting some layers in GPU RAM and the rest in regular system RAM.