Haiku not getting an update is becoming telling. I suspect we are reaching a point where the low end models are cannibalizing high end and that isn't going to stop. How will these companies make money in a few years when even the smallest models are amazing?

▲

qingcharles 2 hours ago | parent | next [-]

Google is putting a lot of research into small models. Most of my AI budget is now going to small models because I am doing lots of tiny tasks that the small models do great with. I would think a decent chunk of Goog's API revenue probably comes from their small models.

▲

blixt 7 hours ago | parent | prev | next [-]

Isn't it pretty common for the smaller models to release a little while after the bigger ones, for all the big model providers?

	▲	jmward01 7 hours ago \| parent [-]
		The last update for Haiku was in October, or in startup land, 10 years ago.

▲

mvkel 7 hours ago | parent | prev | next [-]

It seems to be a rule that older models are more expensive than newer ones. The low end models have higher $CPT and worse output. I wonder if the move is to just have one model and quantize if you hit compute constraints

▲

deaux 5 hours ago | parent [-]

> It seems to be a rule that older models are more expensive than newer ones.

It isn't. Gemini has gotten more expensive with each release. Anthropic has stayed pretty similar over time, no? When is the last time OpenAI dropped API prices? OpenAI started very high because they were the first, so there was a ton of low hanging fruit and there was much room to drop.

	▲	mvkel 2 hours ago \| parent [-]
		I'm talking about gross margins, not revenue. It's well known that GPT-4 is much more expensive to operate than the GPT-5 family. Of course they won't drop the prices; it's pure profit if they make models more efficient.

▲

dkhenry 7 hours ago | parent | prev [-]

The Gemma models are at this point. A 31B model that can fit on a consumer card is as good as Sonnet 4.5. I haven't put it through as much on the coding front or tool calling as I have the Claude or GPT models, but for text processing it is on par with the frontier models.

▲

make3 6 hours ago | parent [-]

absolutely not on par you're smoking

▲

dkhenry 6 hours ago | parent | next [-]

You make a compelling argument, but thankfully I have data to back up my anecdotal experience

This comparison shows them neck and neck https://benchlm.ai/compare/claude-sonnet-4-5-vs-gemma-4-31b

As Does this one https://llm-stats.com/models/compare/claude-sonnet-4-6-vs-ge...

And the pelican benchmark even shows them pretty close https://simonwillison.net/2026/Apr/2/gemma-4/ https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/

Also this isn't a fringe statement, you can see most people who have done an evaluation agree with me

	▲	jmward01 4 hours ago \| parent [-]
		I think one area I find hard to get around is context length. Everything self hosted is so limited on length that it is marginal to use. Additionally I think that the tools (like claude code) are clearly in the training mix for Anthropic's models so they seem to get a boost over other models pushed into that environment. That being said, open source and local inference is -really- good and only going to get better. There is no doubt that the current frontier biz model is not sustainable.

▲

lostmsu 6 hours ago | parent | prev [-]

Just to be clear, did you notice the parent said 4.5?

	▲	cmorgan31 6 hours ago \| parent [-]
		They are also on par in a lot of classification tasks. I did have to actually use gemma4 and fine tune it a bit but that is part of the value add.