▲ | smcleod 5 hours ago | |
Somewhat related - on several occasions I've come across the claim that _"Ollama is just a llama.cpp wrapper"_, which is inaccurate and completely misses the point. I am sharing my response here to avoid repeating myself repeatedly. With llama.cpp running on a machine, how do you connect your LLM clients to it and request a model gets loaded with a given set of parameters and templates? ... you can't, because llama.cpp is the inference engine - and it's bundled llama-cpp-server binary only provides relatively basic server functionality - it's really more of demo/example or MVP. Llama.cpp is all configured at the time you run the binary and manually provide it command line args for the one specific model and configuration you start it with. Ollama provides a server and client for interfacing and packaging models, such as:
In addition to the llama.cpp engine, Ollama are working on adding additional model backends (e.g. things like exl2, awq, etc...).Ollama is not "better" or "worse" than llama.cpp because it's an entirely different tool. | ||
▲ | spencerchubb an hour ago | parent [-] | |
I think what you just said actually reinforces the point that ollama is a llama.cpp wrapper. I don't say that to disparage ollama, in fact I love ollama. It is an impressive piece of software. If x uses y under the hood, then we say "x is a y wrapper" |