I prefer Ollama over the suggested alternatives.

I will switch once we have good user experience on simple features.

A new model is released on HF or the Ollama registry? One `ollama pull` and it's available. It's underwhelming? `ollama rm`.

▲

kennywinker 6 hours ago | parent | next [-]

> This creates a recurring pattern on r/LocalLLaMA: new model launches, people try it through Ollama, it’s broken or slow or has botched chat templates, and the model gets blamed instead of the runtime.

Seems like maybe, at least some of the time, you’re being underwhelmed my ollama not the model.

The better performance point alone seems worth switching away

▲

speedgoose 5 hours ago | parent [-]

I follow the llama.cpp runtime improvements and it’s also true for this project. They may rush a bit less but you also have to wait for a few days after a model release to get a working runtime with most features.

	▲	Maxious 5 hours ago \| parent [-]
		Model authors are welcome to add support to llama.cpp before release like IBM did for granite 4 https://github.com/ggml-org/llama.cpp/pull/13550

▲

derrikcurran 4 hours ago | parent | prev | next [-]

`wget https://huggingface.co/[USER]/[REPO]/resolve/main/[FILE_NAME...`

`rm [FILE_NAME]`

With Ollama, the initial one-time setup is a little easier, and the CLI is useful, but is it worth dysfunctional templates, worse performance, and the other issues? Not to me.

Jinja templates are very common, and Jinja is not always losslessly convertible to the Go template syntax expected by Ollama. This means that some models simply cannot work correctly with Ollama. Sometimes the effects of this incompatibility are subtle and unpredictable.

▲

pheggs 5 hours ago | parent | prev | next [-]

you can pull directly from huggingface with llama.cpp, and it also has a decent web chat included

▲

speedgoose 5 hours ago | parent [-]

Does it have a model registry with an API and hot swapping or you still have to use sometime like llama swap as suggested in the article ? Or is it CLI?

▲

dminik 5 hours ago | parent [-]

You can have multiple models served now with loading/unloading with just the server binary.

https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...

	▲	speedgoose 4 hours ago \| parent [-]
		It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.

▲

ekianjo 3 hours ago | parent | prev [-]

You have no idea what you are downloading with such a pull. At least LMstudio gives you access to all the different versions of the same model.

	▲	speedgoose 2 hours ago \| parent [-]
		https://ollama.com/library/gemma4/tags I see quite a few versions, and I can also use hugging face models.