Don't buy the Mini or Studio. Both have the M4 which lacks the Neural Accelerators, making prompt processing ~3-4x slower.

▲

mortenjorck 3 hours ago | parent [-]

I assume those don't just work automatically with an off-the-shelf gguf. What do you need in your local inference stack to take advantage of M5's neural accelerators?

	▲	aurareturn 3 hours ago \| parent [-]
		They do work with llama.cpp and MLX automatically.