| ▲ | iagooar 3 hours ago | |||||||||||||||||||
I love running two models locally: qwen3.6 27B 8bit (dense) and qwen3.6 35B 4bit (MoE). The 27B is the smarter, more reliable one - but it is slower. The 35B is faster, still very smart but below 27B, a bit less reliable. The reason is the MoE - Mixture of Experts architecture, which only activates a subset of parameters, making the model much much faster. I run the 27B on a MacBook Pro M5 Max + 40 GPU cores + 128GB RAM (well, on this beast I can have 27B + 35B in memory at the same time with headroom for all the other stuff). But because this is a laptop, it is not possible to run local LLMs all the time - it just gets too hot and too loud. What excites me more: I run the 35B model on a MacMini M4 with 64GB RAM. It is fast, it gets a lot of work done (e.g. it scans, extracts and classifies my emails, it watches the mailbox all the time and does work). I also use it as my private Hermes assistant ("when is the next Starship launch?", "who is playing today at the World Cup? Give me some trivia"). Next step I am planning is a RTX Pro 6000 Blackwell workstation I can put in my basement. I want to run qwen really fast, with multiple threads / prompts / agents at once. And MAYBE if the budget allows, a 2x RTX Pro 6000 setup in order to run DeepSeek v4 flash on it (to run research on it). | ||||||||||||||||||||
| ▲ | Barbing 3 hours ago | parent | next [-] | |||||||||||||||||||
Did you get a Brave search API key or something for that “Hermes”? | ||||||||||||||||||||
| ||||||||||||||||||||
| ▲ | zerd 2 hours ago | parent | prev [-] | |||||||||||||||||||
I'd love an RTX 6000 Pro, but how can you justify it when it costs 10 years worth of Claude Max? | ||||||||||||||||||||
| ||||||||||||||||||||