>Oh does llama.cpp use MLX or whatever?

No. It runs on MacOS but uses Metal instead of MLX.

zozbot234 13 hours ago | parent | next [-]

ANE-powered inference (at least for prefill, which is a key bottleneck on pre-M5 platforms) is also in the works, per https://github.com/ggml-org/llama.cpp/issues/10453#issuecomm...

▲

OkGoDoIt 12 hours ago | parent | prev [-]

Is that better or worse?

▲

irusensei 10 hours ago | parent [-]

Depends.

MLX is faster because it has better integration with Apple hardware. On the other hand GGUF is a far more popular format so there will be more programs and model variety.

So its kinda like having a very specific diet that you swear is better for you but you can only order food from a few restaurants.

	▲	drob518 9 hours ago \| parent [-]
		But you can always fall back to GGUF while waiting for the world to build a few more MLX restaurants. Or something like that; the analogy is a bit stretched.