| ▲ | epolanski 2 hours ago | |||||||
1. Deepseek V4 is still in preview (training is not finished) 2. Qwen is much more demanding and borderline unusable on consumer hardware because it's a dense model. The 27B parameters are active all time for each token. It's not a MoE architecture where a router activates only some of them. 3. Qwen doesn't like quantization at all. | ||||||||
| ▲ | kgeist an hour ago | parent | next [-] | |||||||
I have to disagree with most claims. I run Qwen3.6-27b at 260k context and 40-60 tok/sec. It handles most coding problems as well as Sonnet 4.6 under OpenCode on our production tasks. (As an experiment, I run the same prompts for the same issues in parallel for Qwen 3.6 and Sonnet 4.6 and usually see little difference in performance). I see zero degradation from quantization in practice. Settings: RTX 5090, 5-bit weights (Unsloth), FP8 KV cache. Last time I tried running large MoEs on this PC, they had inferior quality at 2-3 bits compared to much smaller dense models at 5-6 bits, and were slower anyway. | ||||||||
| ||||||||
| ▲ | trollbridge an hour ago | parent | prev [-] | |||||||
You can run the 35B A3B model which is an MoE. Runs great on a 5090. | ||||||||