Here is a trend I'm noticing:

- GPT-5 mini costs $0.25/$2 and will be discontinued in December.

- GPT-5.4 mini costs $0.75/$4.5 and is supposed to be the replacement.

- GPT-5.4 nano costs $0.2/$1.25 and, while it ranks better in benchmarks than GPT-5 mini, it's not even close when you test it in real scenarios.

So you're left being forced to go to GPT 5.4 mini if you use 5 mini today.

The same thing is happening here as their “Luna“ model will cost $1/$6.

Can't we just stay with the models we actually want? I don't need GPT 5.4 mini. GPT-5 does the job.

Maybe it’s the realization that it was never that cheap in the first place and they're forcing us to upgrade in a slow and painful way.

▲

wolttam 4 hours ago | parent | next [-]

If you have no need for Anthropic/OpenAI's frontier model capability, you may be better served with an open-weight model that can't be taken away.

Edit:

> GPT-5 does the job.

I bring up DeepSeek V4 Flash a lot on HN, but I want to mention that according to Artificial Analysis, it trades blows with GPT-5 (high) (from August, 2025) [0]

[0]: https://artificialanalysis.ai/models/comparisons/deepseek-v4...

▲

lmf4lol an hour ago | parent | next [-]

We rolled out Deepseek V4 Flash to our customers and it was an absolute disaster, unfortunately. It was not able to follow simple commands, always "forgot" to do things, lied consistently about its work, and so on. It was pretty good though on on-off work, like summarizing something or executing simple commands, so we are experimenting now with using it for subagent work with clear instructions and hand off.

Deepseek V4 Pro on the other hand is a really really good main driver and we have a lot of success using it. Its not Opus or GPT-5.5 level but on its way. Kimi 2.6 as well btw.. so there is already quite some choice.

	▲	wolttam 30 minutes ago \| parent [-]
		I found Flash to be a bit shaky as well until I started using it in xhigh/max thinking effort, then it became my daily driver. It runs quite well on a couple of DGX Sparks. I still wish it was a little better, but there's hope for another model checkpoint (maybe with some of GLM 5.2's goodness distilled into it, that would be nice).

▲

RALaBarge 2 hours ago | parent | prev | next [-]

It’s my daily driver in opencode

▲

paxys 4 hours ago | parent | prev | next [-]

Unless you are hosting it yourself on your own infrastructure it absolutely can be taken away.

▲

atherton94027 4 hours ago | parent | next [-]

For all intents and purposes you'll be able to move an open weight model wherever you want.

I really dislike this rhetoric, you sound like the FSF guys who are like "you're not free until you're running coreboot with zero binary blobs". Sure they have a point but also, most people are fine running regular linux.

▲

salviati 2 hours ago | parent | next [-]

Reading your comment made me realize that I love that the position of the FSF is held by someone, in the interest of stretching the Overton Window to that side.

▲

adrianN 4 hours ago | parent | prev | next [-]

Most FSF guys actually have very nuanced views on the topic and you’re doing everyone a disservice by reducing it to an extremist sound bite.

▲

jjmarr 42 minutes ago | parent | next [-]

That's literally the official FSF position.

https://www.fsf.org/resources/hw

> For example: the Free Software Foundation only purchases desktop machines which support Libreboot, and Thinkpad X200 and X60 laptops with Libreboot. All desktops and servers we buy are KGPE-D16 motherboards, which are supported by Libreboot. As a result, all of the workstations used by the FSF staff have a free BIOS.

https://www.gnu.org/distros/common-distros.html

> Except where noted, all of the distributions listed on this page fail to follow the guidelines in at least two important ways:

> ...The kernel that they distribute (in most cases, Linux) includes “blobs”: pieces of object code distributed without source, usually firmware to run some device.

They are extreme, uncompromising, and live by their principles.

They are also the reason you can buy a computer meeting those requirements instead of being a pipe dream.

▲

ffsm8 3 hours ago | parent | prev | next [-]

Thankfully he didn't say that they're all like that. Instead he pointed out the few that are as a well known example of similar behavior.

If you reread the comment with a fresh mind you'll notice that you misunderstood what he wrote

▲

citadel_melon an hour ago | parent [-]

When attacking archetypes of people, there is some responsibility to make clear who you’re attacking and why, even to someone who’s not being hyper-open-minded. At least if you want them to learn from you: which may or may not be your goal. When you attack/signal you’re on the offensive, it is foolish to believe that they won’t knee-jerk attack back and become closed minded at least a little.

Regardless, the “misinterpretation” of the parent comment is actually a plausible interpretation. I suspend my judgement on what the actual “correct” interpretation of the original comment is: there are too many plausible interpretations to deductively decide. But I do know that since they first comment brought up a contentious issue, they should have put more work into crafting their message so there aren’t so many plausible interpretations that are contradictory. Or alternatively, they should have specified more precisely who they were talking about without a shadow of a doubt. That is if the commenter cared to be properly interpreted, but that may not be their goal. There are many reasonable reasons why that wouldn’t be their goal.

	▲	morgoo 27 minutes ago \| parent \| next [-]
		You used a lot of words to defend a strawman argument
	▲	verve_rat 28 minutes ago \| parent \| prev \| next [-]
		When you read someone's comment there is some responsibility to read the words they wrote and not attempt to attack them for an argument no reasonable person would extract from those words.
	▲	NamlchakKhandro 39 minutes ago \| parent \| prev [-]
		Angry girlfriend SMS essay

▲

charcircuit 2 hours ago | parent | prev [-]

It is the FSF itself who has these extremist views.

▲

sauwan 4 hours ago | parent | prev [-]

Unless the US Gov bans inference companies from serving Chinese models to US customers...

	▲	tancop 3 hours ago \| parent [-]
		good luck doing it to inference companies in singapore or the netherlands. or one of the decentralized networks that dont look useful right now. the world is already sick of america acting like it can do whatever and force their rules on the rest of us.

▲

GTP 4 hours ago | parent | prev | next [-]

Still, with the same model being served by multiple providers, it is much less likely to disappear entirely, even if you would like to keep using a cloud provider. Worst-case scenario, you change providers. Or you use OpenRouter as a proxy.

▲

dgellow 3 hours ago | parent | prev | next [-]

There is actual market competition to host open models. If one provider stops offering a model you likely can find another provider that will

	▲	an hour ago \| parent [-]
		[deleted]

▲

theptip 3 hours ago | parent | prev | next [-]

No. As long as you downloaded the weights, you can run them somewhere.

▲

amunozo 4 hours ago | parent | prev | next [-]

But you have multiple providers, not just one.

▲

paxys 4 hours ago | parent | next [-]

And every single one of those providers would buckle under government pressure.

Fable itself is hosted on all major cloud providers. How many offer it today?

▲

eli 3 hours ago | parent | next [-]

This seems a little fanciful.

There's really no comparison between a model that Anthropic allows Google and Amazon to host with one that has been downloaded hundreds of thousands of times and has dozens of public inference providers.

	▲	Art9681 an hour ago \| parent [-]
		I don't think they "allow" Google or Amazon to host them so much as Anthropic itself is deploying and managing their services on multiple cloud providers just like every other global scale business. Even the models served via OpenRouter are just being routed to compute under Anthropic control. Same with OpenAI. They aren't going to hand the world's most valuable intellectual property at the moment to some third party to run independently. Now for the Chinese models on OpenRouter, yea. Those providers could be legit. Or it could be a failed crypto mining operation pivoting to providing AI compute. Who knows.

▲

minimaxir 4 hours ago | parent | prev | next [-]

The providers on OpenRouter are not all in the US.

	▲	paxys 4 hours ago \| parent [-]
		That doesn’t mean they are immune to US laws. If they want to continue to operate in the largest market in the world they will fall in line. And if you are a legit American business you aren’t going to illegally bypass import/export controls.

▲

svachalek 4 hours ago | parent | prev [-]

More importantly, the download is out there. You can download it yourself today, and if it's that important to you, you can buy the hardware too.

▲

cyanydeez 4 hours ago | parent | prev [-]

I'm sure he's referring to the tightening of internet controls around social media as an extrapolation to controlling websites, etc.

	▲	logicchains 4 hours ago \| parent [-]
		Even in that case it can't be taken away; GPT and Claude are banned in China yet there's still a huge black market for tokens.

▲

supern0va 3 hours ago | parent | prev | next [-]

>Unless you're running Linux yourself, it can absolutely be taken away.

▲

Zambyte 2 hours ago | parent [-]

Yes. The difference is obviously that full, fat Linux runs on a superset of anything a layperson would call a computer, and can be built from source on roughly the same set of hardware. Running the full, fat Deepseek (as in the 1.6T model, unquantized) is too big to run on anything a layperson would call a computer, and being able to actually build it is even harder.

	▲	supern0va 25 minutes ago \| parent [-]
		It's famously difficult to find people willing to rent you time on big computers over the internet.

▲

GaggiX 4 hours ago | parent | prev [-]

Popular open models on Openrouter have dozens of providers.

▲

ai_fry_ur_brain 31 minutes ago | parent | prev [-]

Deepseek V4 flash is actually useless. Sorry I've tested it after seeing so many comments like these. On Open router when trying to get it to output tool calls for creating tables, instead of providing the structured output correctly it was sending me peoples dropbox links and other image sharing site urls that led to pictures of random tables...

Llms seem to only impress a certain type of person. Hint, this type of person also was really excited about NFTs.

▲

paxys 5 hours ago | parent | prev | next [-]

It’s the same as the SaaS model. Price keeps going up, and to justify it they keep forcing you to upgrade to new versions with features that nobody asked for.

▲

theptip 3 hours ago | parent [-]

“More intelligence” is the new feature. Almost everyone is asking for this.

Citation: have you looked at OAI and Anthropic’s customer growth numbers?

▲

paxys 2 hours ago | parent [-]

Every use case of every customer doesn’t need more intelligence. I’m willing to bet that the vast majority will be perfectly fine running on “low intelligence” at a cheap price forever.

	▲	theptip 31 minutes ago \| parent [-]
		I for sure agree that plenty of current use-cases are solvable by non-frontier models. However, you said “new versions with features that nobody asked for”, and I would prefer that you concede the point before shifting to arguing a new point. What customers are asking for is smarter models. Because the tasks that only smarter models can solve are higher value, higher margin, than the tasks that non-frontier models can solve.

▲

mchusma 3 hours ago | parent | prev | next [-]

I've struggled with this. You definitely can have great cheap models. There are many of them open source and served profitably by neo-clouds. The big labs have basically given up on cheap models, and it is frustrating. It means applications are not likely to build as much on them anymore (we are shifting workloads from Haiku/Sonnet to Deepseek v4, for example).

I suspect the problem is that they need to charge a lot to keep revenue numbers up, and they are more worried about cannibalizing themselves than others cannibalizing them.

▲

neosat 4 hours ago | parent | prev | next [-]

Good observations. There's definitely a trend in pricing increasing but also balanced by innovations and availability of other models (both open and closed) emerging as alternatives. It's natural for the labs to explore how much they can push pricing, and for competitors to explore how they can treat that margin as their opportunity to grow their business.

Eventually the pricing should be more stable.

	▲	benterix 4 hours ago \| parent [-]
		> Eventually the pricing should be more stable. Why do you think so? This game can be played forever, you just need strong marketing and orgs gullible enough to pay a higher price for a minor upgrade.

▲

mistic92 4 hours ago | parent | prev | next [-]

Its happening to Anthropic Haiku and Gemini Flash/Flash lite. All of them are increasing prices and deprecating cheap models.

▲

hadlock 2 hours ago | parent | prev | next [-]

Each model release gives an opportunity to reduce the number of old models still on offer, and charge a higher, less-subsidized tier. The trick is to charge a subsidized price that is less than an M3 Ultra, so they continue paying you rent, instead of a one-time fixed cost. So far open models can't compete with Opus 4.5 but as soon as it can, people will be looking at buying devices that can run that model locally.

We are a claude shop but we already bought two mac studios to start migrating less complex but still agentic workflows there. We will break even on those in less than a year.

▲

simonw 4 hours ago | parent | prev | next [-]

On Nano "it's not even close when you test it in real scenarios" - what have you seen? What kind of things can GPT-5 Mini handle that GPT-5.4 Nano cannot?

	▲	isamu_2000 4 hours ago \| parent [-]
		We’re using GPT-5-mini in an enterprise data-processing workflow, and we too see that GPT-5.4 nano performs materially worse for our requirements, roughly 30% worse as measured through our test suite.

▲

btbuildem 2 hours ago | parent | prev | next [-]

> stay with the models we actually want

If you want control over the models you use, you have to self-host.

▲

CSMastermind 2 hours ago | parent | prev | next [-]

5.5 is smart enough for 99% of my tasks. I need that level of intelligence at ever decreasing prices.

▲

mips_avatar 3 hours ago | parent | prev | next [-]

I think it's more that they're abandoning simpler AI tasks to chinese models. Qwen 35b and deepseek flash are better than gp5 mini on my tasks and way cheaper.

▲

malnourish 4 hours ago | parent | prev | next [-]

Hardware hosting old models isn't hosting new models. If you want consistent models, host your own open weights ones.

▲

3 hours ago | parent | prev | next [-]

[deleted]

▲

theptip 3 hours ago | parent | prev | next [-]

> Maybe it’s the realization that it was never that cheap in the first place and they're forcing us to upgrade in a slow and painful way.

All the analysis I have seen points to frontier models being profitable to serve. It’s using 50% or more of your GPUs for research plus CapEx for capacity expansion that makes these businesses so heavily cash-negative.

What you are observing is downstream of another detail. It gets more expensive to serve a model as utilization goes down. Plus the opportunity cost vs newer, more-profitable models.

There are plenty of valid reasons to critique here. “OpenAI is lying about this being a sustainable price to serve” is not one of them.

▲

4 hours ago | parent | prev | next [-]

[deleted]

▲

sourcecodeplz 4 hours ago | parent | prev | next [-]

who tf would use mini when you have dsv4 flash

▲

tosh 4 hours ago | parent | prev | next [-]

discontinuing the cheaper options is a risky move for openai

will trigger re-evaluations of models by other labs + inference providers

	▲	HyperL0gi 3 hours ago \| parent [-]
		I can speak for myself. We are exactly at this moment trying to replace GPT 5 mini with an open weight / open source model. No luck so far.

▲

gonzalohm 4 hours ago | parent | prev | next [-]

Yeah, this is the classic silicon valley strategy of selling at a loss and then once they have captured the market inflate prices.

See Uber, Netflix, etc.

▲

CraigRood 3 hours ago | parent | next [-]

I don't see them capturing anything at this point. If inference was profitable then they could compete on price/model and capture the market. Then increase price and pay back the model training.

Feels like they are just pulling in as much as they can whilst competing on capabilities instead. At which point its a case of who can last the longest.

Doesn't feel like Uber/Netflix.

▲

simianwords 4 hours ago | parent | prev [-]

This is a constantly repeated conspiracy theory and is not true at all. The api costs do increase but aggregate costs per task decrease. The question is: do people need lower intelligence models at all? The answer is a resounding NO!

How many people do you see using haiku or sonnet? I see very few and most people default to the latest model and just play with thinking effort. I think three layers are good enough and supporting more is not a good UX.

	▲	phainopepla2 3 hours ago \| parent \| next [-]
		Are you only considering coding use cases? Many enterprise use cases, such as simple data extraction, are well served by cheaper models.
	▲	gonzalohm 4 hours ago \| parent \| prev \| next [-]
		Do I need the most intelligent model to generate boilerplate code, which is my main usage for AI? Resounding No. For my use case a model from a year ago is good enough
	▲	unknownfuture 3 hours ago \| parent \| prev [-]
		I... use them all the time: plan with a more advanced model, build with a cheaper one. Anthropic literally packages a metamodel (opusplan) for that pattern. Also: calling the SV blitzscaling strategy of using VC money to fund loss leader products with the goal of building a monopoly via dumping a conspiracy is quite the position given there's entire books written in the topic...

▲

cyanydeez 4 hours ago | parent | prev [-]

No, you can't. These companies have two infrastructures: model training and model inference.

Inference needs to cache, it can't cache random model data, so it's essentially dedicated; it can't spin up models on demand, it has to know what demand is coming.

These companies are going to end up with very few models offered and that's probably generous. They might end up with just one model and you pay for removing it's safe guards.