Remix.run Logo
IMTDb 2 days ago

A key difference is that the cost to execute a cab ride largely stayed the same. Gas to get you from point A to point B is ~$5, and there's a floor on what you can pay the driver. If your ride costs $8 today, you know that's unsustainable; it'll eventually climb to $10 or $12.

But inference costs are dropping dramatically over time, and that trend shows no signs of slowing. So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing.

lompad 2 days ago | parent | next [-]

>But inference costs are dropping dramatically over time,

Please prove this statement, so far there is no indication that this is actually true - the opposite seems to be the case. Here are some actual numbers [0] (and whether you like Ed or not, his sources have so far always been extremely reliable.)

There is a reason the AI companies don't ever talk about their inference costs. They boast with everything they can find, but inference... not.

[0]: https://www.wheresyoured.at/oai_docs/

patresh a day ago | parent | next [-]

I believe OP's point is that for a given model quality, inference cost decreases dramatically over time. The article you linked talks about effective total inference costs which seem to be increasing.

Those are not contradictory: a company's inference costs can increase due to deploying more models (Sora), deploying larger models, doing more reasoning, and an increase in demand.

However, if we look purely at how much it costs to run inference on a fixed amount of requests for a fixed model quality, I am quite convinced that the inference costs are decreasing dramatically. Here's a model from late 2025 (see Model performance section) [1] with benchmarks comparing a 72B parameter model (Qwen2.5) from early 2025 to the late 2025 8B Qwen3 model.

The 9x smaller model outperforms the larger one from earlier the same year on 27 of the 40 benchmarks they were evaluated on, which is just astounding.

[1] https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct

academia_hack a day ago | parent | prev [-]

++

Anecdotally, I find you can tell if someone worked at a big AI provider or a small AI startup by proposing an AI project like this:

" First we'll train a custom trillion parameter LLM for HTML generation. Then we'll use it to render our homepage to our 10 million daily visitors. "

The startup people will be like "this is a bad idea because you don't have enough GPUs for training that LLM" and the AI lab folks will be like "How do you intend to scale inference if you're not Google?"

SecretDreams 2 days ago | parent | prev | next [-]

> But inference costs are dropping dramatically over time, and that trend shows no signs of slowing. So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

I'd like to see this statement plotted against current trends in hardware prices ISO performance. Ram, for example, is not meaningfully better than it was 2 years ago, and yet is 3x the price.

I fail to see how costs can drop while valuations for all major hardware vendors continue to go up. I don't think the markets would price companies in this way if the thought all major hardware vendors were going to see margins shrink a la commodity like you've implied.

santadays 2 days ago | parent | next [-]

I've seen the following quote.

"The energy consumed per text prompt for Gemini Apps has been reduced by 33x over the past 12 months."

My thinking is that if Google can give away LLM usage (which is obviously subsidized) it can't be astronomically expensive, in the realm of what we are paying for ChatGPT. Google has their own TPUs and company culture oriented towards optimizing the energy usage/hardware costs.

I tend to agree with the grandparent on this, LLMs will get cheaper for what we have now level intelligence, and will get more expensive for SOTA models.

lelanthran 2 days ago | parent | next [-]

Google is a special case - ever since LLMs came out I've been pointing out that Google owns the entire vertical.

OpenAI, Anthropic, etc are in a race to the bottom, but because they don't own the vertical they are beholden to Nvidia (for chips), they obviously have less training data, they need constant influsx of cash just to stay in that race to the bottom, etc.

Google owns the entire stack - they don't need nvidia, they already have the data, they own the very important user-info via tracking, they have millions, if not billions, of emails on which to train, etc.

Google needs no one, not even VCs. Their costs must be a fraction of the costs of pure-LLM companies.

viraptor 2 days ago | parent | next [-]

> OpenAI, Anthropic, etc are in a race to the bottom

There's a bit of nuance hiding in the "etc". Openai and anthropic are still in a race for the top results. Minimax and GLM are in the race to the bottom while chasing good results - M2.1 is 10x cheaper than Sonnet for example, but practically fairly close in capabilities.

lelanthran a day ago | parent [-]

> There's a bit of nuance hiding in the "etc". Openai and anthropic are still in a race for the top results.

That's not what is usually meant by "race to the bottom", is it?

To clarify, in this context I mean that they are all in a race to be the lowest margin provider.

They re at the bottom of the value chain - they sell tokens.

It's like being an electricity provider: if you buy $100 or electricity and produce 100 widgets, which you sell for $1k each, that margin isn't captured by the provider.

That's what being at the bottom of the value chain means.

viraptor a day ago | parent [-]

I get what it means, but it doesn't look to me like they're trying that yet. They don't even care that people buy multiple highest level plans to rotate them every week, because they don't provide a high enough tier for the existing customers. I don't see any price war happening. We don't know what their real margins are, but I don't see the race there. What signs do you see that Anthropic and Openai are in the race to the bottom?

lelanthran a day ago | parent [-]

> I don't see any price war happening. What signs do you see that Anthropic and Openai are in the race to the bottom?

There doesn't need to be signs of a race (or a price-war),only signs of commodification; all you need is a lack of differentiation between providers for something to turn into a commodity.

When you're buying a commodity, there's no big difference between getting your commodity delivered by $PROVIDER_1 and getting your commodity delivered by $PROVIDER_2.

The models are all converging quality-wise. Right now the number of people who swear by OpenAI models are about the same as the number of people who swear by Anthropic models, which are about the same as the number of people who swear by Google's models, etc.

When you're selling a commodity, the only differentiation is in the customer experience.

Right now, sure, there's no price war, but right now almost everyone who is interested are playing with multiple models anyway. IOW, the target consumers are already treating LLMs as a commodity.

flyinglizard 2 days ago | parent | prev [-]

Gmail has 1.8b active users, each with thousands of emails in their inbox. The number of emails they can train of is probably in the trillions.

brokencode 2 days ago | parent [-]

Email seems like not only a pretty terrible training data set, since most of it is marketing spam with dubious value, but also an invasion of privacy, since information could possibly leak about individuals via the model.

palmotea 2 days ago | parent [-]

> Email seems like not only a pretty terrible training data set, since most of it is marketing spam with dubious value

Google probably even has an advantage there: filter out everything except messages sent from valid gmail account to valid gmail account. If you do that you drop most of the spam and marketing, and have mostly human-to-human interactions. Then they have their spam filters.

Terr_ 2 days ago | parent [-]

I'd upgrade that "probably" leak to "will absolutely" leak, albeit with some loss of fidelity.

Imagine industrial espionage where someone is asking the model to roleplay a fictional email exchange between named corporate figures in a particular company.

SoftTalker 2 days ago | parent | prev | next [-]

> Google has ... company culture oriented towards optimizing the energy usage/hardware costs.

Google has a company culture of luring you in with freebies and then mining your data to sell ads.

AdrianB1 2 days ago | parent | prev | next [-]

> if Google can give away LLM usage (which is obviously subsidized) it can't be astronomically expensive

There is a recent article by Linus Sebastian (LTT) talking about Youtube: it is almost impossible to support the cost to build a competitor because it is astronomically expensive (vs potential revenue)

SecretDreams 2 days ago | parent | prev [-]

I do not disagree they will get cheaper, but I pointing out that none of this is being reflected in hardware pricing. You state LLMs are becoming more optimized (less expensive). I agree. This should have a knockon effect on hardware prices, but it is not. Where is the disconnect? Are hardware prices a lagging indicator? Is Nvidia still a 5 trillion dollar company if we see another 33x improvement in "energy consumed per text prompt"?

zozbot234 2 days ago | parent [-]

Jevon's paradox. As AI gets more efficient its potential scope expands further and the hardware it runs on becomes even more valuable.

BTW, the absolute lowest "energy consumed per logical operation" is achieved with so-called 'neuromorphic' hardware that's dog slow in latency terms but more than compensates with extreme throughput. (A bit like an even more extreme version of current NPU/TPUs.) That's the kind of hardware we should be using for AI training once power use for that workload is measured in gigawatts. Gaming-focused GPUs are better than your average CPU, but they're absolutely not the optimum.

mcphage 2 days ago | parent | prev | next [-]

> So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

The same task on the same LLM will cost $8 or less. But that's not what vendors will be selling, nor what users will be buying. They'll be buying the same task on a newer LLM. The results will be better, but the price will be higher than the same task on the original LLM.

PaulHoule 2 days ago | parent | prev | next [-]

It's not the hardware getting cheaper, it's that LLMs were developed when we really didn't understand how they worked, and there is still some room to improve the implementations, particularly do more with less RAM... And that's everything from doing more with fewer weights to things like FP16, not to mention if you can 2x the speed you can get twice as much done with the same RAM and all the other parts.

SecretDreams 2 days ago | parent [-]

Improvements in LLM efficiency should be driving hardware to get cheaper.

I agree with everything you've said, I'm just not seeing any material benefit to the statement as of now.

sothatsit 2 days ago | parent [-]

Inference costs falling 2x doesn’t decrease hardware prices when demand for tokens has increased 10x.

PaulHoule 2 days ago | parent [-]

It's the ratio. If revenue goes up 10x you can afford 10x more hardware if you can afford to do it all.

hug 2 days ago | parent | prev | next [-]

> I'd like to see this statement plotted against current trends in hardware prices ISO performance.

Prices for who? The prices that are being paid by the big movers in the AI space, for hardware, aren't sticker price and never were.

The example you use in your comment, RAM, won't work: It's not 3x the price for OpenAI, since they already bought it all.

xpe 2 days ago | parent | prev | next [-]

> I fail to see how costs can drop while valuations for all major hardware vendors continue to go up. I don't think the markets would price companies in this way if the thought all major hardware vendors were going to see margins shrink a la commodity like you've implied.

This isn't hard to see. A company's overall profits are influenced – but not determined – by the per-unit economics. For example, increasing volume (quantity sold) at the same per-unit profit leads to more profits.

doctorpangloss 2 days ago | parent | prev | next [-]

> I fail to see how costs can drop while valuations for all major hardware vendors continue to go up.

yeah. valuations for hardware vendors have nothing to do with costs. valuations are a meaningless thing to integrate into your thinking about something objective like, will the retail costs of inference trend down (obviously yes)

glemion43 2 days ago | parent | prev [-]

[dead]

forty 2 days ago | parent | prev | next [-]

What if we run out of GPU? Out of RAM? Out of electricity?

AWS is already raising GPU prices, that never happened before. What if there is war in Taiwan? What if we want to get serious about climate change and start saving energy for vital things ?

My guess is that, while they can do some cool stuff, we cannot afford LLMs in the long run.

jiggawatts 2 days ago | parent [-]

> What if we run out of GPU?

These are not finite resources being mined from an ancient alien temple.

We can make new ones, better ones, and the main ingredients are sand and plastic. We're not going to run out of either any time soon.

Electricity constraints are a big problem in the near-term, but may sort themselves out in the long-term.

twelvedogs 2 days ago | parent | next [-]

> main ingredients are sand and plastic

kinda ridiculous point, we're not running into gpu shortages because we don't have enough sand

renegade-otter a day ago | parent | next [-]

We already had a sand shortage. In 2019...

https://www.bbc.com/future/article/20191108-why-the-world-is...

Craighead 2 days ago | parent | prev | next [-]

Even funnier, there are legitimate shortages of usable sand.

jiggawatts a day ago | parent | prev [-]

That’s my point: the key inputs are not materials but the high tech machinery and the skills to operate them.

Draiken a day ago | parent [-]

Which is better because?

We can't copy/paste a new ASML no matter how hard you try (aside from open sourcing all of their IPs). Even if you do, by the time you copy one generation of machine, they're on a new generation and you now still have the bottleneck on the same place.

Not to mention that with these monopolies they can just keep increasing prices ad infinitum.

jiggawatts a day ago | parent [-]

ASML's secret sauce is not that secret or uncopyable. The Chinese are already working on their clone of the Twinscan tools.

Veritasium recently made a good video on the ASML machine design: https://youtu.be/MiUHjLxm3V0

The outcome may seem like magic, but the input is "simply" hard work and a big budget: billions of dollars and years of investment into tuning the parameters like droplet size, frequency, etc...

The interviews make it clear that the real reason ASML's machines are (currently) unique is that few people had the vision, patience, and money to fund what seemed at the time impossible. The real magic was that ASML managed to hang on by a fingernail and get a successful result before the money ran out.

Now that tin droplet EUV lasers have not only been demonstrated to be possible, but have become the essential component of a hugely profitable AI chip manufacturing industry, obtaining funding to develop a clone will be much easier.

Draiken an hour ago | parent [-]

> ASML's secret sauce is not that secret or uncopyable.

You must've watched a different video. They took a decade to get there and they're happy to show all the how-to's because they know the devil is in the details.

forty 2 days ago | parent | prev [-]

If the US is ready to start a war against Europe to invade Groenland, it's certainly because they need more sand and plastic? Of course in weight it's probably mostly sand and plastic but the interesting bit probably needs palladium, copper, boron, cobalt, tungsten, etc

rhubarbtree 2 days ago | parent | next [-]

Well, also for military purposes.

And general imperialism.

jiggawatts a day ago | parent | prev [-]

Greenland is Trump’s Ukraine. He’s jealous of Putin, that is all.

There is nothing in Greenland worth breaking up the alliances with Europe over.

Trump is too stupid to realise this, he just wants land like it’s a Civ game.

PS: An entire rack of the most expensive NVIDA equipment millions of dollars can buy has maybe a few grams of precious or rare metals in it. The cost of those is a maybe a dollar or two. They don’t even use gold any more!

The expensive part is making it, not the raw ingredients.

gylterud a day ago | parent | next [-]

One would then maybe suspect breaking up alliances with Europe is the point of the whole thing.

jiggawatts a day ago | parent [-]

Some of the best advice I've ever heard is to look at how people act and ignore how they claim they act or their stated reasons for doing so.

A corollary is that even a "technically false" model can better predict someone's actions than a "truthful one".

Trump may not be a Russian agent, but he acts like one consistently.

It's more effective to simply assume he's an agent of a foreign power, because that's the best predictor of his actions.

imcritic 19 hours ago | parent | prev [-]

That alliance costs money. It doesn't bring anything good in return: the USSR (that this alliance was created against) is long gone. Trump is a genius if he somehow manages to kill 2 birds with 1 stone: make OTHER parties of the alliance want to disband the alliance AND get some piece of land with a unique strategic position all to himself/U.S.

I think it's Putin who is going to be jealous of Trump, not the other way around.

iwontberude 2 days ago | parent | prev | next [-]

Your point could have made sense but the amount of inference per request is also going up faster than the costs are going down.

supern0va 2 days ago | parent | next [-]

The parent said: "Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing."

SOTA improvements have been coming from additional inference due to reasoning tokens and not just increasing model size. Their comment makes plenty of sense.

manmal 2 days ago | parent | prev [-]

Is it? Recent new models tend to need fewer tokens to achieve the same outcome. The days of ultrathink are coming to an end, Opus is well usable without it.

2 days ago | parent [-]
[deleted]
2 days ago | parent | prev [-]
[deleted]