The take away is that this model is a smaller model that competes with Haiku, I would hope they come out with a "Sonnet" competing model, then Opus. I have been wondering why Microsoft is kind of "sleeping" on offering models they themselves have made on Copilot, maybe it was part of their deal with OpenAI? Not sure.

▲

mdasen 5 hours ago | parent | next [-]

Yes, it's a "smaller" (137B) model that competes with Haiku, but it's basically the performance of Qwen3.6-35B-A3B which is 75% smaller and 98% smaller in terms of active parameters (since it's a mixture of experts model). Microsoft should be comparing its model to good smaller models, not Haiku 4.5.

Qwen-3.6-27b is closer to Claude Opus 4.7 than it is to Haiku 4.5 in a lot of benchmarks - and it's way smaller than Microsoft's new model.

Sure, it competes with Haiku, but it shows how far Microsoft is behind lots of other small models that are available.

	▲	stingraycharles 4 hours ago \| parent [-]
		I understand what you’re saying, but I am generally very careful when comparing models and their benchmarks; benchmarks often don’t really match “real world” quality.

▲

minraws 6 hours ago | parent | prev [-]

They did release, MAI-Thinking-1 to compete with Sonnet. Totally not sure why that isn't at the top here.

	▲	lostmsu 2 hours ago \| parent \| next [-]
		Compete? It is behind Kimi K2.6, which is in turn away behind Sonnet.
	▲	giancarlostoro 6 hours ago \| parent \| prev [-]
		Good question, and I missed that entirely!