Remix.run Logo
camelmel 6 hours ago

Huh, according to that model card this is a 137B total parameter model.

Performance doesn't seem that good:

- MAI-Code-1-Flash (137B-A5B) = 51% on SWE-bench pro

- Qwen3.6-35B-A3B = 49.5% on SWE-bench pro (https://huggingface.co/Qwen/Qwen3.6-35B-A3B)

They benchmark against Claude Haiku but Haiku is not good, it's worse than tiny open models you can run locally or via API at 10% the cost.

giancarlostoro 6 hours ago | parent | next [-]

The take away is that this model is a smaller model that competes with Haiku, I would hope they come out with a "Sonnet" competing model, then Opus. I have been wondering why Microsoft is kind of "sleeping" on offering models they themselves have made on Copilot, maybe it was part of their deal with OpenAI? Not sure.

mdasen 5 hours ago | parent | next [-]

Yes, it's a "smaller" (137B) model that competes with Haiku, but it's basically the performance of Qwen3.6-35B-A3B which is 75% smaller and 98% smaller in terms of active parameters (since it's a mixture of experts model). Microsoft should be comparing its model to good smaller models, not Haiku 4.5.

Qwen-3.6-27b is closer to Claude Opus 4.7 than it is to Haiku 4.5 in a lot of benchmarks - and it's way smaller than Microsoft's new model.

Sure, it competes with Haiku, but it shows how far Microsoft is behind lots of other small models that are available.

stingraycharles 4 hours ago | parent [-]

I understand what you’re saying, but I am generally very careful when comparing models and their benchmarks; benchmarks often don’t really match “real world” quality.

minraws 6 hours ago | parent | prev [-]

They did release, MAI-Thinking-1 to compete with Sonnet. Totally not sure why that isn't at the top here.

lostmsu 2 hours ago | parent | next [-]

Compete? It is behind Kimi K2.6, which is in turn away behind Sonnet.

giancarlostoro 6 hours ago | parent | prev [-]

Good question, and I missed that entirely!

6 hours ago | parent | prev | next [-]
[deleted]
kristjansson 5 hours ago | parent | prev | next [-]

> 137B-A5B

Yeah, not a 5B param model as the earlier title implied!

wetpaws 6 hours ago | parent | prev [-]

[dead]