Yes, Qwen3.6-35B-A3B on a Strix Halo 128GB (Bosgame M5).

I have way too much VRAM forme such a model but Qwen never released the 122B version of Qwen3.6, which is the best class of model for my hardware. But at the same time my electricity bill is negligible, this is originally a laptop chip and it shows, it consumes almost nothing while idle and a little above 120W during prompt processing.

And Qwen3.6 has been surprisingly effective for me, I still use Clause occasionally but only for like 10% of my needs which allows me to stay well under the quota even with the cheapest plan.

Speed: ~800tps prompt processing and 50tps for token generation (with no speculative decoding).

▲

manmal 4 hours ago | parent [-]

Have you tried the 27B dense version? It’s way better for coding.

▲

anana_ 4 hours ago | parent [-]

Unfortunately on Strix Halo or any similar unified memory set up, dense models are gonna be dirt slow due to the tiny memory bandwidth... But I agree, 27B is superior.

	▲	stymaar 3 hours ago \| parent [-]
		Exactly. That's why I'm disappointed there wasn't a 122B version, it's 27B but for Strix Halo users.