▲ | hoppp 7 days ago | |
It probably loads the entire model into ram at once while ollama solves this and does not, it has a better loading strategy | ||
▲ | blooalien 7 days ago | parent [-] | |
Yeah, if I remember correctly, Ollama loads models in "layers" and is capable of putting some layers in GPU RAM and the rest in regular system RAM. |