What is the difference between Ollama, llama.cpp, ggml and gguf?

Ollama is a user-friendly UI for LLM inference. It is powered by llama.cpp (or a fork of it) which is more power-user oriented and requires command-line wrangling. GGML is the math library behind llama.cpp and GGUF is the associated file format used for storing LLM weights.

	▲	redmalang 10 hours ago \| parent [-]
		i've found llama.cpp (as i understand it, ollama now uses their own version of this) to work much better in practice, faster and much more flexible.

▲

xiconfjs 11 hours ago | parent | prev [-]

Ollama on MacOS is a one-click solution with stable obe-click updates. Happy so far. But the mlx support was the only missing piece for me.

▲

yard2010 9 hours ago | parent [-]

Can you please write about your hardware?

▲

xiconfjs 2 hours ago | parent [-]

* macOS 26.x on MacBookPro M1 Max 32GB * Ollama on macOS, cursor to play around * Open WebUI [1] on my Homeserver via API to Ollama (also for remote „A.I.“ access) * running gpt-oss:20b, qwen3.5:9b with ease, qwen3.5:27b for more complex tasks

[1] https://github.com/open-webui/open-webui

▲

brcmthrowaway an hour ago | parent [-]

Seems complicated. Switch to LMStudio

	▲	xiconfjs 25 minutes ago \| parent [-]
		I tried man times but at least with its API active, LMStudio has some kind of memory leaks which will slow down the whole system (after ~1-2 days of uptime) even after unloading the model and stopping LMStudio up to a point where even playing a 1080p video results in frame drops. No such issues with Ollama.