> Ollama lets you just install it, just install models, and go.

So does the original llama.cpp. And you won't have to deal with mislabeled models and insane defaults out of the box.

Can it easily run as a server process in the background? To me, not having to load the LLM into memory for every single interaction is a big win of Ollama.

▲

otabdeveloper4 6 days ago | parent [-]

Yes, of course it can.

	▲	lxgr 6 days ago \| parent [-]
		I wouldn't consider that a given at all, but apparently there's indeed `llama-server` which looks promising! Then the only thing that's missing seems to be a canonical way for clients to instantiate that, ideally in some OS-native way (systemd, launchcd etc.), and a canonical port that they can connect to.