Remix.run Logo
tyfon 6 hours ago

I think the biggest advantage for me with ollama is the ability to "hotswap" models with different utility instead of restarting the server with different models combined with the simple "ollama pull model". In other words, it has been quite convenient.

Due to this post I had to search a bit and it seems that llama.cpp recently got router support[1], so I need to have a look at this.

My main use for this is a discord bot where I have different models for different features like replying to messages with images/video or pure text, and non reply generation of sentiment and image descriptions. These all perform best with different models and it has been very convenient for the server to just swap in and out models on request.

[1] https://huggingface.co/blog/ggml-org/model-management-in-lla...

majorchord 6 hours ago | parent | next [-]

> the ability to "hotswap" models with different utility instead of restarting the server

The article mentions llama-swap does this

3 hours ago | parent | prev | next [-]
[deleted]
hacker_homie 5 hours ago | parent | prev | next [-]

Llama.cpp added the ability load/switch models on demand with the max-models and models preset flags.

4 hours ago | parent | prev | next [-]
[deleted]
segmondy 6 hours ago | parent | prev | next [-]

You can do that with llama-server

ekianjo 2 hours ago | parent | prev [-]

Llama-server which is part of llamacpp does this for a few months now