No mention of the fact that Ollama is about 1000x easier to use. Llama.cpp is a great project, but it's also one of the least user friendly pieces of software I've used. I don't think anyone in the project cares about normal users.

I started with Ollama, and it was great. But I moved to llama.cpp to have more up-to-date fixes. I still use Ollama to pull and list my models because it's so easy. I then built my own set of scripts to populate a separate cache directory of hardlinks so llama-swap can load the gguf's into llama.cpp.

▲

AndroTux 5 hours ago | parent | next [-]

Exactly. The blog post states that the alternatives listed are similarly intuitive. They are not. If you just need a chat app, then sure, there’s plenty of options. But if you want an OpenAI compatible API with model management, accessibility breaks down fast.

I’m open to suggestions, but the alternatives outlined in the blog post ain’t it.

▲

mentalgear 5 hours ago | parent | next [-]

The reported alternatives seem pretty User-Friendly to me:

> LM Studio gives you a GUI if that’s what you want. It uses llama.cpp under the hood, exposes all the knobs, and supports any GGUF model without lock-in.

> Jan(https://www.jan.ai/) is another open-source desktop app with a clean chat interface and local-first design.

> Msty(https://msty.ai/) offers a polished GUI with multi-model support and built-in RAG. koboldcpp is another option with a web UI and extensive configuration options.

API wise: LM Studio has REST, OpenAI and more API Compatibilities.

	▲	shantnutiwari 2 hours ago \| parent [-]
		All of those options were either too slow, or didnt work for me (Mac with Intel). I could have spent hours googling, but I downloaded Ollama and it just worked. So no, they are not alternatives to ollama

▲

adrian_b an hour ago | parent | prev | next [-]

What you say was true in the past.

As other posters report, now llama-server implements an OpenAI compatible API and you can also connect to it with any Web browser.

I have not tried yet the OpenAI API, but it should have eliminated the last Ollama advantage.

I do not believe that the Ollama "curated" models are significantly easier to use for a newbie than downloading the models directly from Huggingface.

On Huggingface you have much more details about models, which can allow you to navigate through the jungle of countless model variants, to find what should be more suitable for yourself.

The fact criticized in TFA, that the Ollama "curated" list can be misleading about the characteristics of the models, is a very serious criticism from my point of view, which is enough for me to not use such "curated" models.

I am not aware of any alternative for choosing and downloading the right model for local inference that is superior to using directly the Huggingface site.

I believe that choosing a model is the most intimidating part for a newbie who wants to run inference locally.

If a good choice is made, downloading the model, installing llama.cpp and running llama-server are trivial actions, which require minimal skills.

▲

homarp 4 hours ago | parent | prev | next [-]

like someone said above: brew install llama.cpp

llama-server -hf ggml-org/gemma-4-E4B-it-GGUF --port 8000 (with MCP support and web chat interface)

and you have OpenAI API on the same 8000 port. (https://github.com/ggml-org/llama.cpp/tree/master/tools/serv... lists the endpoints)

▲

Philip-J-Fry 3 hours ago | parent | prev [-]

What do you mean?

LMStudio is listed as an alternative. It offers a chat UI, a model server supporting OpenAI, Anthropic and LMStudio API interfaces. It supports loading the models on demand or picking what models you want loaded. And you can tweak every parameter.

And it uses llama.cpp which is the whole point of the blog post.

▲

myfakebadcode 18 minutes ago | parent | prev | next [-]

Least friendly you’ve used makes me think you’ve been spoiled. :)

Agreed ollama is a good intro but once you move beyond it starts to be a pain.

▲

kgeist 3 hours ago | parent | prev | next [-]

>No mention of the fact that Ollama is about 1000x easier to use

I remember changing the context size from the default unusable 2k to something bigger the model actually supports required creating a new model file in Ollama if you wanted the change to persist (another alternative: set an env var before running ollama; although, if you go that low-level route, why not just launch llama.cpp). How was that easier? Did they change this?

I remember people complaining model X is "dumb" simply because Ollama capped the context size to a ridiculously small number by default.

IMHO trying to model Ollama after Docker actually makes it harder for casual users. And power users will have it easier with llama.cpp directly

▲

rowendduke 3 hours ago | parent | prev | next [-]

Not like it mattered much to me but llama-cpp is way lighter and 10x smaller in size.

Resumable downloads seem to work better in llama-cpp.

I love the inbuilt GUI.

I used ollama first and honestly, llama-cpp has been a much better experience.

Maybe given enough time, I would have seen the benefit of ollama but the inability to turn off updates even after users requested it extensively made me uninstall it. Postman PTSD is real.

▲

flux3125 4 hours ago | parent | prev | next [-]

> so llama-swap can load

Just in case you haven't seen it yet, llama.cpp now has a router mode that lets you hot-swap models. I've switched over from llama-swap and have been happy with it.

▲

BrissyCoder 5 hours ago | parent | prev | next [-]

> No mention of the fact that Ollama is about 1000x easier to use.

Easier than what?

I came across LM Studio (mentioned in the post) about 3 years ago before I even knew what Ollama as. It was far better even then.

▲

throw9393rj 5 hours ago | parent | prev | next [-]

I spend like 2 hours trying to get vulkan acceleration working with ollama, no luck (half models are not supported and crash it). With llama.cpp podman container starts and works in 5 minutes.

▲

Eisenstein 2 hours ago | parent | prev [-]

Koboldcpp is a single executable with a GUI launcher and a built in webui. It also supports tts, stt, image gen, embeddings, music creation, and a bunch of other stuff out of the box, and can download and browse HF models from within the GUI. That's pretty easy to use.