I love running two models locally: qwen3.6 27B 8bit (dense) and qwen3.6 35B 4bit (MoE).

The 27B is the smarter, more reliable one - but it is slower. The 35B is faster, still very smart but below 27B, a bit less reliable. The reason is the MoE - Mixture of Experts architecture, which only activates a subset of parameters, making the model much much faster.

I run the 27B on a MacBook Pro M5 Max + 40 GPU cores + 128GB RAM (well, on this beast I can have 27B + 35B in memory at the same time with headroom for all the other stuff). But because this is a laptop, it is not possible to run local LLMs all the time - it just gets too hot and too loud.

What excites me more: I run the 35B model on a MacMini M4 with 64GB RAM. It is fast, it gets a lot of work done (e.g. it scans, extracts and classifies my emails, it watches the mailbox all the time and does work). I also use it as my private Hermes assistant ("when is the next Starship launch?", "who is playing today at the World Cup? Give me some trivia").

Next step I am planning is a RTX Pro 6000 Blackwell workstation I can put in my basement. I want to run qwen really fast, with multiple threads / prompts / agents at once. And MAYBE if the budget allows, a 2x RTX Pro 6000 setup in order to run DeepSeek v4 flash on it (to run research on it).

▲

Barbing 3 hours ago | parent | next [-]

Did you get a Brave search API key or something for that “Hermes”?

	▲	iagooar 2 hours ago \| parent \| next [-]
		Yes, Brave search is one of these services I highly recommend paying for, the search they provide (similar to Exa, Tavily) is what makes an "OK LLM" become super smart.
	▲	nickthegreek an hour ago \| parent \| prev \| next [-]
		I have my mine setup with a searxng instance I run in a docker. Works great and costs zero.
	▲	dghlsakjg 3 hours ago \| parent \| prev [-]
		Hermes is just an agent that can be setup for whatever you want (coding or more commonly personal assistant ala clawdbot). You can set it up with any of the standard tools and MCPs like brave or tavily for search.

▲

zerd 2 hours ago | parent | prev [-]

I'd love an RTX 6000 Pro, but how can you justify it when it costs 10 years worth of Claude Max?

	▲	iagooar 2 hours ago \| parent [-]
		10 years worth of Claude Max today. Also - Anthropic recently removed a model I relied on and isn't giving it back. As a non-US citizen, I would rather pay in advance but be sure, I will keep having access to inference on my own terms. Also, it will just be faster - and more fun too.