| ▲ | mft_ 3 hours ago | |
The 27B model is dense, so is relatively slow. The 35B-A3B model is marginally weaker but being MoE is much faster - like ~4-8x faster in basic benchmarks on my M1 Max. For comparison, I just ran a couple of quick benchmarks (default settings) with llama-bench: Qwen3.6-35B-A3B at Q6_K_XL gave 858 t/s pp512 (prompt processing) and 43 t/s tg128 (token generation). Qwen3.6-27B at Q4_K_XL gave 103 t/s pp512 and 8 t/s tg128. | ||
| ▲ | pixelesque 2 hours ago | parent [-] | |
Thanks for the info. | ||