| ▲ | arxell 8 hours ago | |||||||
Each has it's pros and cons. Dense models of equivalent total size obviously do run slower if all else is equal, however, the fact is that 35A3B is absolutely not 'a lot smarter'... in fact, if you set aside the slower inference rates, Qwen3.5 27B is arguably more intelligent and reliable. I use both regularly on a Strix Halo system... the Just see the comparison table here: https://huggingface.co/unsloth/Qwen3.6-35B-A3B-GGUF . The problem that you have to acknowledge if running locally (especially for coding tasks) is that your primary bottleneck quickly becomes prompt processing (NOT token generation) and here the differences between dense and MOE are variable and usually negligible. | ||||||||
| ▲ | nunodonato 6 hours ago | parent | next [-] | |||||||
I was hoping this would be the model to replace our Qwen3.5-27B, but the difference is marginally small. Too risky, I'll pass and wait for the release of a dense version. | ||||||||
| ▲ | Mikealcl 7 hours ago | parent | prev [-] | |||||||
Could you explain why prompt processing is the bottle neck please? I've seen this behavior but I don't understand why. | ||||||||
| ||||||||