Remix clone Hacker News

new | show | ask | jobs Github

	▲	mft_ 3 hours ago
		The 27B model is dense, so is relatively slow. The 35B-A3B model is marginally weaker but being MoE is much faster - like ~4-8x faster in basic benchmarks on my M1 Max. For comparison, I just ran a couple of quick benchmarks (default settings) with llama-bench: Qwen3.6-35B-A3B at Q6_K_XL gave 858 t/s pp512 (prompt processing) and 43 t/s tg128 (token generation). Qwen3.6-27B at Q4_K_XL gave 103 t/s pp512 and 8 t/s tg128.
	▲	pixelesque 2 hours ago \| parent [-]
		Thanks for the info.