or you can just load up ollama, have it load a local model and point claude or opencode at it...

is this article old? It's not. I'm not sure why he went through all the bother of llama.cpp

That was exactly my same question. Then I finished reading the post. The reason is pretty clear, and written in the post: it is faster than ollama+mlx.

	▲	sleepybrett 4 hours ago \| parent [-]
		how much faster?