Remix.run Logo
hoppp 7 days ago

It probably loads the entire model into ram at once while ollama solves this and does not, it has a better loading strategy

blooalien 7 days ago | parent [-]

Yeah, if I remember correctly, Ollama loads models in "layers" and is capable of putting some layers in GPU RAM and the rest in regular system RAM.