We are slowly inching closer to the point where AI and AI products will be billed for what they cost. We are currently living in the heavily discounted world where everything subsidized to the point where a lot of it is free. It seems like they can't or won't keep that up anymore. My prediction is that whenever one of the big companies raise their prices or move features to higher tiers others will follow soon. They all feel the pressure and non of them want to give away more money than they need to.

I wonder if managers will be as excited about AI when the prices go up.

▲

Aurornis 2 days ago | parent | next [-]

> We are slowly inching closer to the point where AI and AI products will be billed for what they cost.

I suspect the API prices are already served with profitable unit economics. The SOTA API prices are much higher than the costs for other providers to run very large open weight models.

The monthly subscription plans were being offered at a discount to generate interest in these models.

We're not entering a period of billing AI at cost. We're entering a period of exploring how how the prices can go before losing too many customers.

Products and services aren't sold at cost. They're sold at the price the market will bear. It takes some experimentation to find that equilibrium point where you make more profit per customer but don't lose too many customers.

▲

malfist 2 days ago | parent [-]

> I suspect the API prices are already served at prices with profitable unit economics.

There is absolutely no evidence to support this.

▲

selectodude 2 days ago | parent | next [-]

Some basic math supports it. A GB300 NVL72 is about $6.5 million. Lets say that you need $6 million worth of cooling and another $6 million worth of electricity. At current rates, that's 720 billion tokens worth of Claude Opus 4.7. At 100,000 tokens per second, it pays for itself in about 3 months.

Obviously this is an extremely rough calculation. I can even be off by a factor of 10 and it's still a pretty good return.

▲

overfeed 2 days ago | parent [-]

Unless you're serving Chinese open-weight models - you have to consoder training costs. If you're off my 10x, then the amortization period is 30 months - far longer than the useful lifetimes of SoTA models. Frontier model development is a Red Queens race: you have to run as fast as you can, just to maintain your position.

▲

selectodude a day ago | parent [-]

The discussion was if Anthropic makes money on inference. They do. They lose billions on training.

▲

joshuastuden a day ago | parent [-]

No, because Anthropic can't serve their models unless they train them.

Training is akin to the cost of building the software/product. Inference is selling the product.

	▲	BadBadJellyBean a day ago \| parent [-]
		It's quite easy to sell something for a profit if you ignore the costs. Ultimate free money hack. I will start selling canned beans for the price of the beans plus a few cents. I will just ignore the cost of the cans, labor, power, machines, maintenance, distribution, storage and facility space. If I do that the few cents extra are pure profit.

▲

speedgoose 2 days ago | parent | prev | next [-]

We don’t know the models sizes, requirements, and optimisations, but we could take a guess using the infrastructure costs of the largest open weight alternatives that perform slightly worse.

In my opinion, it’s a profitable kind of service. They probably don’t pay the public prices for the cloud GPUs though.

▲

BadBadJellyBean a day ago | parent | next [-]

Just looking at infra cost is not enough. If the token price doesn't contain all the costs they are losing money and they eventually have to raise prices more.

▲

hyperadvanced 2 days ago | parent | prev [-]

In my opinion it seems like a very unprofitable service propped up by investor money trying to capture market share.

Or, as I would say if I were Bugs Bunny, “Duck Season”

	▲	ofjcihen 2 days ago \| parent [-]
		Rabbit season!

▲

Aurornis 2 days ago | parent | prev | next [-]

> There is absolutely no evidence to support this.

Analysts like Semi-Analysis have done a lot of modeling and estimates on the topic.

But two can play this game: There is absolutely no evidence to support that API prices do not have profitable unit economics.

▲

Khalos a day ago | parent [-]

I'm not familiar with that analysis, its accuracy, or its evidence. I would be surprised by this given it seems like providers are still in the growth phase.

Typically the burden of proof is on the one making the claim.

▲

Aurornis a day ago | parent | next [-]

https://semianalysis.com/

They have some of the best publicly available analysis on these topics. The full details and numbers are hidden behind the institutional accounts which are priced for investors (not something you sign up for personally) but they're generous with what they send out in their newsletter.

If you're not familiar with resources like this I could understand how you'd assume that the providers are hemorrhaging money on inference costs, because that is that story that gets parroted around spaces like Hacker News.

You could ignore all of that, though, and go check OpenRouter to see how much providers are selling high parameter count models. They're not entirely at the level of the SOTA models, but the biggest open weight models are not that far behind in complexity either. They're being sold an order of magnitude cheaper than what you pay for the APIs from the major players. We don't know exactly how big the major models are, but it's unlikely that they're more than 10X more compute intensive from the leaks we do have.

▲

joshuastuden a day ago | parent [-]

[flagged]

	▲	Aurornis a day ago \| parent \| next [-]
		If you’re demanding rigorous proof for only one side of an argument while assuming the other side must be true, you’re not interested in honest debate. The cost of AI inference has been a heavily analyzed topic. I trust the professional analysts much more than the casual Hacker News commenter claiming they’re losing money per token because they’re repeating what they heard some other Hacker News commenter say
	▲	JohnHaugeland a day ago \| parent \| prev [-]
		the claim that they suspect something is adequately backed up by saying “i suspect this” nobody needs to prove their suspicions

▲

JohnHaugeland a day ago | parent | prev [-]

there is no burden of proof on someone who says “i suspect”

▲

winfredJa 2 days ago | parent | prev [-]

Mr.Truth-teller Amodei confirmed it that APIs are profitable at Anthropic.

	▲	Eufrat 2 days ago \| parent \| next [-]
		I don’t think any of the AI model providers have produced any evidence to back their claims of profitability. I want to see their S-1s, then we can fight.
	▲	disgruntledphd2 2 days ago \| parent \| prev [-]
		He didn't, he talked very carefully in hypotheticals.

▲

bobbiechen 2 days ago | parent | prev | next [-]

I called this last year: https://digitalseams.com/blog/the-ai-lifestyle-subsidy-is-go... .

I see it as no different from the previous generation of consumer startups burning money - as Derek Thompson wrote,

> ...if you woke up on a Casper mattress, worked out with a Peloton, Ubered to a WeWork, ordered on DoorDash for lunch, took a Lyft home, and ordered dinner through Postmates only to realize your partner had already started on a Blue Apron meal, your household had, in one day, interacted with eight unprofitable companies that collectively lost about $15 billion in one year.

	▲	unrelat3d 2 days ago \| parent [-]
		Everyone called it last year and the year before. The conversation around AI being cheap now started when ChatGPT launched in 2023

▲

libraryofbabel 2 days ago | parent | prev | next [-]

This is already happening. For new Anthropic enterprise accounts you are billed at api token prices (maybe with a small volume discount). Anthropic makes a profit on those tokens. (Sure, that profit does not cover the model training costs, but that’s a separate issue.) It’s the subscriptions for individuals (e.g. Claude Max) that are still subsidized below cost.

> I wonder if managers will be as excited about AI when the prices go up.

Companies are willing to pay the api pricing. Engineering time is very expensive and AI coding agents actually work now since December and are actually showing measurable productivity gains, finally. It’s a good deal to make (obviously, with caveats: you need to make sure your tokens are going on productive tasks that will actually grow revenue) and anyone who penny-pinches is making a strategic mistake.

▲

ericmcer 2 days ago | parent | next [-]

"Engineering time is very expensive"

I always wondered about this statement, like we are generally salaried and there is so many variables that affect how I spend my "time". None of us are machines that can do X work per day and our managers get to slice it as they see fit. Pull a dev off a project they love and throw them onto something they hate and suddenly X is diminished greatly.

I would almost predict that reshaping our workflow to be: "prompt, wait, approve changes." results in losses because it is such a mentally tiring workflow and drills into our brains the desire for the LLM to "just fix it". It is the next level of just moving tickets to completed all day.

▲

BadBadJellyBean 2 days ago | parent | prev | next [-]

> Sure, that profit does not cover the model training costs, but that’s a separate issue.

I don't think it is. At some point they have to make money and they can't do that if the token cost doesn't include ALL the costs. Someone has to pay for that at some point. And someone has to pay for the subsidized subscribers. So no. API token prices don't reflect the real price. They are still subsidized. Just in a different way.

▲

mikeocool 2 days ago | parent | prev | next [-]

> Sure, that profit does not cover the model training costs, but that’s a separate issue

It is? If another company comes out with a better model tomorrow and offers it at the same price Anthropic charges for Opus, they’re going to lose customers fast. They have to keep training to keep selling inference.

Most businesses factor in the cost of making their product into the product’s P&L.

	▲	cyanydeez 2 days ago \| parent [-]
		also, like super mario kart, SOTA models from the rear will be continually released because theyre sunk costs and open weights will advertise for themselves. Also, its clear FOMO is a DDoS attack on any perceived leader because theres no way they dont oversell. Lastly, theyll realize like every good capitalist, theres more profit in exclusiveness and cutiing out customers.

▲

malfist 2 days ago | parent | prev | next [-]

> Anthropic makes a profit on those tokens

Citation needed. Anthropic does not have public books

▲

libraryofbabel 2 days ago | parent [-]

Their CEO is on record as saying this. You may think he's lying, but that's just your opinion; given the pricing and how it stacks relative to the pricing of inference providers of comparable open source models (who are certainly charging above cost!), I am inclined to believe Anthropic on this.

▲

hyperadvanced 2 days ago | parent [-]

Why would you believe a tech CEO who has a vested interest in the untruth but can skirt fiduciary duties by speaking cleverly.

	▲	bostik 2 days ago \| parent \| next [-]
		Maybe because Anthropic are trying to get to an IPO and everything is securities fraud? If their CEO was just flapping his mouth without any other comparable baseline, it'd probably be different. But as the GP points out, open-weight model providers are charging comparable rates and very likely have positive profit margins. That would imply that with API pricing tokens are sold at above cost. That cost may well be "inference only", so excludes everything apart from hardware and power. Whether that's enough to cover the enormous training costs and other overheads is a different question.
	▲	JohnHaugeland a day ago \| parent \| prev \| next [-]
		why would we believe skeptical randos on social media? he has access to the real numbers and a legal risk from lying publicly. it does him no good to lie about this.
	▲	timschmidt 2 days ago \| parent \| prev [-]
		He just told you. Because overwhelming public evidence supports the claim. Especially the pricing of open weight model inference. Why do you allow a prejudice to overshadow evidence?

▲

CodingJeebus 2 days ago | parent | prev [-]

They may be for now. Problem is that when foundation model pricing goes up, you're paying not just the increase in tokens you consume directly, but also for all tokens you're consuming via vendors as well.

If your company has Figma, Github, and Cursor and they're using the same models you are, your monthly costs with them increase as well. You're exposed N times to the foundation model price increases, where N is the number of times software you directly or indirectly use talks to a frontier model.

▲

ericmcer 2 days ago | parent | prev | next [-]

It is already bafflingly expensive. I interviewed at a place recently where they said the average dev was hitting $2k/mo in Claude code costs.

That is no longer a helpful tool... it costs like ~15% of an actual dev.

Even if it is helping, is it actually... making things better or building anything truly important? The issue seems way too nuanced to spend $2k/mo. Not to mention the entire tech industry floats on hype and imaginary goal posts so now what? Devs can hurdle towards those faster and more mindlessly?

▲

Aurornis 2 days ago | parent | next [-]

> That is no longer a helpful tool... it costs like ~15% of an actual dev.

The full cost of each employee is more than their salary. The common estimate is 1.4X their salary due to all of the employer-paid taxes, benefits, and other things.

So even $2K/month of token costs would only be around 10% of the cost of a mid-range developer cost.

It doesn't have to increase productivity much to justify the cost.

▲

legulere 2 days ago | parent | next [-]

Don't forget the organisational overhead. You'll need managers and communication overhead between developers grows superlinear (see Brook's law).

	▲	JohnHaugeland a day ago \| parent [-]
		brook’s law is probably not true anymore

▲

Silhouette 2 days ago | parent | prev | next [-]

The arithmetic is a little different in every country because of local rates of pay and taxation but it's worth remembering that in most of the world except for the richer parts of the US developers do not get paid what those US developers have been making in recent years. There are a few exceptions but the norm is several times less even in major economies in Europe or Asia.

Another challenge for US tech companies is that - if you'll forgive the bluntness - their "brand" is now toxic in most of the world. Almost everyone is trying to distance themselves from US tech as fast as they can. Governments and big businesses are starting to invest seriously in alternative solutions and local resources. It will happen over time but I don't see much the US tech companies or the US government can do to stop the train now the wheels are turning.

So there's a serious risk for US tech companies now of a double whammy where their already relatively high R&D costs increase even further and yet they're also facing much stronger competition in international markets or maybe even excluded from some of those markets entirely.

If we also reach the seemingly inevitable point that "capable enough" LLMs can run locally - or at least as a private resource provided internally by large organisations - there is very little moat left to protect not only US Big Tech whose stocks have been heavily driven by expected returns from AI but the whole US tech industry that is banking on productivity gains from that AI tech. Then they also won't be able to capture most of the entire global supply of components like GPUs/RAM/SSDs because it won't be cost effective any more - and that is one of the few practical moats they have built (however accidentally) that would be a significant barrier to direct competitors setting up shop in places like Europe and Asia.

It's going to be interesting to see how US tech companies respond to these effects over the next 5-10 years. The giants are all aboard the AI train and can't back down now so there will probably be some casualties there if - as again seems inevitable - the bubble bursts at some point. But then there's a very long tail of still very successful US tech companies that might be paying US salaries and using AI-based tools but aren't themselves focussed on developing or providing those AI-based tools and they're the ones who are going to need to find new ways to compete effectively within that kind of time frame.

	▲	demorro 4 hours ago \| parent [-]
		I'm very excited for a tech sector disconnected from silicon valley. We've forgotten that you can get a lot of scrappy stuff done in a shed for quite cheap when you're not trying to inflate hype bubbles constantly.

▲

sarchertech 2 days ago | parent | prev [-]

The 1.4x multiple doesn’t work when you get to engineering salaries.

▲

ilovecake1984 2 days ago | parent | prev | next [-]

The real thing is that it is on tap. If I have to engage an Indian outsourcer to hand off easy stuff to it’s going to be much more painful.

▲

jcgrillo 2 days ago | parent | prev [-]

Some people are claiming to use a billion tokens per day. According to Claude's API pricing page that'll cost somewhere between $3k and $10k per day. Leaving aside whether we trust Claude's API pricing model will remain constant into the future, it's abundantly clear that developer is not generating tens of millions of dollars per year in value.

▲

pphysch 2 days ago | parent | prev | next [-]

Slowly inching? GitHub Copilot announced 600%+ price increases for many workflows, with others being potentially 100x more expensive due to the change from request to token based billing.

	▲	BadBadJellyBean 2 days ago \| parent [-]
		With how intransparent everything in the AI world is I have no idea how far away we are from the "real" prices. Might be a gallop or maybe it's much worse than we think and we are still just inching. We'll see. Exciting times. If you like to see the world burn.

▲

LocalH 2 days ago | parent | prev | next [-]

That's what the cloud AI industry is banking on. Pushing hard to get AI into workflows at a critical position, then raising the cost to turn the screws, hoping that companies would rather pay than pivot again

▲

sassymuffinz 2 days ago | parent | prev | next [-]

It's humorous to me that I can do the work of an AI with nothing but a coffee and an occasional sandwich and yet they talk about AI as if it's some sort of magic hack to productivity.

What they don't like is paying money for the work, that's all that matters to them.

▲

jakobnissen 2 days ago | parent [-]

You work for sandwiches and coffee, and not for a decent salary?

▲

sassymuffinz 2 days ago | parent [-]

I get paid a decent salary for my expertise and this wild thing called remembering what happened yesterday. My point is my compute is powered by bananas and coffee, not gigawatts of energy and all of the RAM.

▲

rescripting a day ago | parent | next [-]

From the POV of your employer your compute is powered by your decent salary, not bananas and coffee. If that goes away you’d be a fool to keep pointing your brain at your employer’s problems.

Thus, your compute is significantly more expensive than AI. Thankfully your taste is also part of your package deal, and is where you deliver real the value over an LLM.

▲

JohnHaugeland a day ago | parent | prev [-]

i don’t really care if my salable work product is banana sourced

▲

sassymuffinz 18 hours ago | parent [-]

If you don't care about humanity and prefer the machines then you're a soulless ghoul like the OpenAI workers, should sign up, you'd probably get a fat bonus for joining.

	▲	JohnHaugeland 17 hours ago \| parent [-]
		insults won’t convince people to agree with you sometimes people just have different belief systems than you, and that’s actually okay

▲

hacker_homie 2 days ago | parent | prev | next [-]

run local models

▲

mark_l_watson a day ago | parent | next [-]

I experiment a lot with local models, great results for engineering tasks, less so for coding agents.

I have used the following on a 32G MacMini to help write useful code:

ollama launch claude --model qwen3.6:27b-coding-nvfp4

The problem is that running local models (except for engineering tasks like data munging) is slow. With the above setup I set up a task (asking for no user verification) and go for a walk to wait for results that my Gemini Ultra plan would produce in 10 seconds.

▲

SchemaLoad 2 days ago | parent | prev [-]

You need massively expensive hardware to run them, and they aren't as good. It's pretty clear the base price of AI tools is way higher than we are being charged right now.

	▲	pixelpoet a day ago \| parent [-]
		I wouldn't call my $2k Strix Halo computer "massively expensive", and it runs e.g. Qwen 3.6 27b brilliantly, with tons of memory to spare and is a full x86 powerhouse pulling 120w at absolute max. IMO the programming world is far too myopic about / insistent on using laptops, especially macbooks. Just because a crappy deal exists doesn't mean everyone is forced to take it. Local AI is a high performance computing problem and laptops are fundamentally a crappy form factor for it; buy an efficient desktop computer and be surprised at what's possible even with today's crazy prices.

▲

XCSme 2 days ago | parent | prev [-]

But with new hardware comping out, and maybe models being smart enough to help with optimizing them and reducing inference costs even more, I think we should still expect the costs to go down.