Huh, according to that model card this is a 137B total parameter model.

Performance doesn't seem that good:

- MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro

- Qwen3.6-35B-A3B = 49.5% on SWE-bench pro (https://huggingface.co/Qwen/Qwen3.6-35B-A3B)

They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost.

▲

giancarlostoro 6 hours ago | parent | next [-]

The take away is that this model is a smaller model that competes with Haiku, I would hope they come out with a "Sonnet" competing model, then Opus. I have been wondering why Microsoft is kind of "sleeping" on offering models they themselves have made on Copilot, maybe it was part of their deal with OpenAI? Not sure.

▲

mdasen 5 hours ago | parent | next [-]

Yes, it's a "smaller" (137B) model that competes with Haiku, but it's basically the performance of Qwen3.6-35B-A3B which is 75% smaller and 98% smaller in terms of active parameters (since it's a mixture of experts model). Microsoft should be comparing its model to good smaller models, not Haiku 4.5.

Qwen-3.6-27b is closer to Claude Opus 4.7 than it is to Haiku 4.5 in a lot of benchmarks - and it's way smaller than Microsoft's new model.

Sure, it competes with Haiku, but it shows how far Microsoft is behind lots of other small models that are available.

	▲	stingraycharles 4 hours ago \| parent [-]
		I understand what you’re saying, but I am generally very careful when comparing models and their benchmarks; benchmarks often don’t really match “real world” quality.

▲

minraws 6 hours ago | parent | prev [-]

They did release, MAI-Thinking-1 to compete with Sonnet. Totally not sure why that isn't at the top here.

	▲	lostmsu 2 hours ago \| parent \| next [-]
		Compete? It is behind Kimi K2.6, which is in turn away behind Sonnet.
	▲	giancarlostoro 6 hours ago \| parent \| prev [-]
		Good question, and I missed that entirely!

▲

6 hours ago | parent | prev | next [-]

[deleted]

▲

kristjansson 5 hours ago | parent | prev | next [-]

> 137B-A5B

Yeah, not a 5B param model as the earlier title implied!

▲

wetpaws 6 hours ago | parent | prev [-]

[dead]