Remix clone Hacker News

new | show | ask | jobs Github

	▲	lettergram 12 minutes ago
		We actually found the Mistral Small 4, quantized to 4bit was comparable to Qwen 3.6 27B and is roughly the same size. At least from our experience on our use cases, the quantization of the Mistral model worked far better than trying to quantize the Qwen family. Fully agree to your point though, Mistral in general is far behind where I'd expect and Qwen in particular is crushing it at the smaller sizes. Personally, I'd consider anything 20B params and above a "medium" model. Small being <20B and large >100B. I think obviously we can get to the huge 1-2T param models, but frankly the margin of accuracy improvement for the speed hit is kinda insane (1-2% for many metrics).