| ▲ | stared 4 hours ago | |||||||
I really recommend Qwen3.6 27B. Make some tests, and its 8 bit version runs at 30tok/s when using llama.cpp with MTP and run on Macbook Max M5. I have 128 GB, but but 64 GB is well enough. https://github.com/stared/benching-local-llms-on-apple-silic... When using benchmarks, it gives more-or-less the level of SotA mid-late 2025. | ||||||||
| ▲ | iagooar 3 hours ago | parent | next [-] | |||||||
I run the exact same model, on the exact same hardware - amazing results. Pair it with good search skills (Tavily, Brave, Exa) and you have a near-SOTA model on your desk. | ||||||||
| ▲ | wizzledonker 3 hours ago | parent | prev [-] | |||||||
Did you mean 2025? | ||||||||
| ||||||||