It's so weird to me that the benchmarks remain so low, but the models are marketed as revolutionary. And if you say that low coding capabilities aren't a problem, say that to the token price hike and 'general use' model setup.

Why not sell it as a math agent? Why do I have to set up 4 agents to check each others' work?

▲

npn 4 hours ago | parent | next [-]

from what I understand, it's because unlike the other models, MAI models haven't yet fine-tuned against the synthetic datasets specifically designed to boost the benchmark scores.

▲

redrove 6 hours ago | parent | prev [-]

It’s about bang for buck. That high a score for 5B params is pretty good, nigh unbelievable a short while ago.

It is my belief that smaller models will get better and better, and even cloud SOTA models will shrink.

Yet another reason the current buildout will feel like the railroads.

▲

necubi 5 hours ago | parent | next [-]

It's 5B active params in MoE, not 5B total params (total is 137B).

▲

bgirard 5 hours ago | parent | prev | next [-]

> It’s about bang for buck.

Hard to know when they don't give the price per token. Presumably it will be comparable to a low-mid range model in terms of price. But otherwise their 'Ideal Zone' is meaningless without factoring in the price per token. I don't how much tokens are being used, that's an implementation detail to me. I care about price / performance / latency.

▲

Flere-Imsaho 6 hours ago | parent | prev | next [-]

Yeah the future is probably a number of highly specialised small models you can run on your own hardware rather than massive frontier models in the cloud.

That's what I'm betting on anyway.

	▲	girvo 3 hours ago \| parent \| next [-]
		Step 3.7 Flash on my Asus GB10 based mini pc is incredibly close to that today. I’m very impressed, and that’s without MTP to boost performance
	▲	thewebguyd 6 hours ago \| parent \| prev \| next [-]
		That seems to be what Microsoft is betting on also based on what was shown at the BUILD keynote today + that new surface ultra and the surface mini PC with the new Nvidia chip. Nadella really played up local AI as the main use case they have in mind.
	▲	search_facility 6 hours ago \| parent \| prev [-]
		MOE basically work that way already, QWEN/etc with low active params (A-number in name) allows to inference big models locally (only active params have to fit into memory)

▲

dist-epoch 6 hours ago | parent | prev [-]

The SOTA models will not shrink, because the problems will get bigger, from "write me a C compiler" to "clone Stripe business and run it".