I really recommend Qwen3.6 27B.

Make some tests, and its 8 bit version runs at 30tok/s when using llama.cpp with MTP and run on Macbook Max M5. I have 128 GB, but but 64 GB is well enough. https://github.com/stared/benching-local-llms-on-apple-silic...

When using benchmarks, it gives more-or-less the level of SotA mid-late 2025.

▲

iagooar 3 hours ago | parent | next [-]

I run the exact same model, on the exact same hardware - amazing results. Pair it with good search skills (Tavily, Brave, Exa) and you have a near-SOTA model on your desk.

▲

wizzledonker 3 hours ago | parent | prev [-]

Did you mean 2025?

	▲	stared 3 hours ago \| parent [-]
		Yes, fixed