Remix.run Logo
revolvingthrow 2 days ago

> pricing "Pro" $3.48 / 1M output tokens vs $4.40

I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

edit: $1.74/M input $3.48/M output on OpenRouter

schneehertz 2 days ago | parent | next [-]

This price is high even because of the current shortage of inference cards available to DeepSeek; they claimed in their press release that once the Ascend 950 computing cards are launched in the second half of the year, the price of the Pro version will drop significantly

Bombthecat 2 days ago | parent [-]

In six month deepseek won't be sota anymore und usage will be wayyyy down.

randomgermanguy 2 days ago | parent | next [-]

Only comparing on SOTA scores (ignoring price etc.) is like choosing your daily-driver by looking at who makes the fastest sports-car...

LinXitoW 2 days ago | parent | next [-]

The constant improvements of SOTA are the main thing keeping the investment machine running. We can't really remove training costs from inference costs, because a bunch of the funding and loans for the inference hardware only exists because the promises the continuous training (tries to) provides.

dnnddidiej 2 days ago | parent | prev [-]

Not really. SOTA vs non SOTA is "can I get my coding work actually done today" vs. "this can do customer support chat"

It is like car vs. kick scooter.

regularfry 2 days ago | parent | next [-]

It really isn't. We get coding work actually done today on Opus 4.5. That's not SOTA any more, and anything proximate to that level, even quite loosely, is genuinely useful.

dnnddidiej 2 days ago | parent [-]

OK we are in Opus 4.5 is not SOTA. Right by that definition .... yes you are right.

randomgermanguy 2 days ago | parent [-]

I mean its almost halve a year, i think that counts ?

dnnddidiej 2 days ago | parent [-]

Time wise you are correct.

randomgermanguy 2 days ago | parent | prev [-]

> "can I get my coding work actually done today" vs. "this can do customer support chat"

I think you need to define "can get coding work done" for this to make sense. Ive been using GPT-3 back-then for basic scripts, does that count ? Or only Claude-Code ?

I also think this is a false dichotomy, if you look at the Project Vend project or Vending-Bench, customer support etc. is at no means trivial. (Old but great story https://www.businessinsider.com/car-dealership-chevrolet-cha...)

UlisesAC4 2 days ago | parent [-]

This, I have been doing my side hustle code with open code an 3.2 reasoner and it is way better than what I have at day job with copilot and whatever models are there.

wahnfrieden a day ago | parent | next [-]

Copilot is a bad harness that perverts the productivity of models like GPT 5.5.

dnnddidiej 2 days ago | parent | prev [-]

Tell me more please!

2ndorderthought 2 days ago | parent | prev | next [-]

A huge proportion of those scores are gamed anyways. Use whatever works for you at the price and availability you can afford

Palmik 2 days ago | parent | prev | next [-]

Or there will be DSv4.1/2/3 ;)

randomgermanguy 2 days ago | parent [-]

Definitely something in this realm, they call the models "preview" at a bunch of different points in the paper.

What im really hoping is for a double-punch like with V3 -> R1

man4 2 days ago | parent | prev | next [-]

[dead]

Barbing 2 days ago | parent | prev [-]

Well, if they distilled once…

menzoic 2 days ago | parent | prev | next [-]

API prices may be profitable. Subscriptions may still be subsidized for power users. Free tiers almost certainly are. And frontier labs may be subsidizing overall business growth, training, product features, and peak capacity, even if a normal metered API call is profitable on marginal inference.

dannyw 2 days ago | parent [-]

Research and training costs have to be amortized from somewhere; and labs are always training. I'm definitely keen for the financials when the two files for IPO though, it would be interesting to see; although I'm sure it won't be broken down much.

m00x 2 days ago | parent | prev | next [-]

They are profitable to opex costs, but not capex costs with the current depreciation schedules, though those are now edging higher than expected.

nl 2 days ago | parent [-]

Amazingly, the current depreciation overestimates the retained value of GPUs.

In 2023, the depreciation schedule for H100s was 2 years, but they are still oversubscribed and generating signficant income.

Coreweve has upped their depreciation for GPUs to 6 years(!) now, which seems more realistic.

https://www.silicondata.com/blog/h100-rental-price-over-time

amunozo 2 days ago | parent | prev | next [-]

I was thinking the same. How can it be than other providers can offer third-party open source models with roughly the similar quality like this, Kimi K2.6 or GLM 5.1 for 10 times less the price? How can it be that GPT 5.5 is suddenly twice the price as GPT 5.4 while being faster? I don't believe that it's a bigger, more expensive model to run, it's just they're starting to raise up the prices because they can and their product is good (which is honest as long as they're transparent with it). Honestly the movement about subscription costing the company 20 times more than we're paying is just a PR movement to justify the price hike.

peepee1982 2 days ago | parent [-]

I'm pretty sure OpenAI and Anthropic are overpricing their token billed API usage mainly as an incentive to commit to get their subscriptions instead.

simonjgreen 2 days ago | parent | next [-]

Anthropic recently dropped all inclusive use from new enterprise subscriptions, your seat sub gets you a seat with no usage. All usage is then charged at API rates. It’s like a worst of both worlds!

peepee1982 2 days ago | parent [-]

What's the point then? Special conditions for data retention/non-training policies?

simonjgreen 2 days ago | parent [-]

SSO Tax is a large part of it, controls around plug-in marketplace, enforcement of config, observeability of spend. But it’s all pretty weak really for $20 a month.

And Microsoft are going the same route to moving Copilot Cowork over to a utilisation based billing model which is very unusual for their per seat products (I’m actually not sure I can ever remember that happening).

weird-eye-issue 2 days ago | parent | prev [-]

The target audience for the APIs is third party apps which are not compatible with the subscriptions.

peepee1982 2 days ago | parent [-]

True. I missed that.

adam_patarino 2 days ago | parent | prev | next [-]

Prices are not just hard cost of inference. Training costs are not equal. Chinese labs have cheaper access to large data centers. I also suspect they operate far more efficiently than orgs like openAI.

mirzap 2 days ago | parent | prev | next [-]

My thoughts exactly. I also believe that subscription services are profitable, and the talk about subsidies is just a way to extract higher profit margins from the API prices businesses pay.

Bombthecat 2 days ago | parent [-]

Google stated a while back, that with tpus they are able to sell at cost / with profit.

Aka: everyone who uses Nvidia isn't selling at cost, because Nvidia is so expensive.

LinXitoW 2 days ago | parent | prev | next [-]

They got loans to buy inference hardware on the promise of potential AGI, or at least something approaching ASI, all leading to stupid amounts of profit for those investors.

We therefore cannot just look at inference costs directly, training is part of the pitch. Without the promises of continuous improvement and chasing the elusive AGI, money for investments for inference evaporates.

WarmWash 2 days ago | parent | prev | next [-]

Because you are comparing China to the US.

In China you need to appease state goals. In the US you need to appease investor goals.

China will keep funding them regardless of their income, because the goal is (ostensibly) a state AGI/ASI. In the US, the goal is an ROI which may or may not come with AGI/ASI.

They are different economies with different goals. We can look at past Chinese national projects and see that they are fine with burning $50 to get [social goal] that's worth $5.

ting0 2 days ago | parent [-]

This is nonsense. The real reason is because the US companies are scamming the public, as per usual.

vitorgrs 2 days ago | parent | prev | next [-]

And they actually say the prices will be "significantly" lower in second semester when Huawei 650 chips comes in.

Flavius 2 days ago | parent | prev | next [-]

It's because investors in OpenAI/Anthropic want to get their money back in 10 months, not in 10 years.

raincole 2 days ago | parent | prev | next [-]

Insert always has been meme.

But seriously, it just stems from the fact some people want AI to go away. If you set your conclusion first, you can very easily derive any premise. AI must go away -> AI must be a bad business -> AI must be losing money.

2 days ago | parent | next [-]
[deleted]
louiereederson 2 days ago | parent | prev | next [-]

It is possible to question the sustainability of the AI buildout and not have a dogmatic position on AI development.

There are still major unanswered questions here. For instance, all of the incremental data capacity build out is going to businesses that have totally unknown LT unit economics and that today are burning obscene amounts of cash.

evilos 2 days ago | parent | prev | next [-]

The people who doubted the sustainability of dot com era bubbles were correct even though the tech was actually transformational. Personally I expect roughly the same outcome.

zarzavat 2 days ago | parent | prev [-]

Before the AI bubble that will burst any time now, there was the AI winter that would magically arrive before the models got good enough to rival humans.

jimmydoe 2 days ago | parent | prev | next [-]

They’ve also announced Pro price will further drop 2H26 once they have more HUAWEI chips.

masafej536 2 days ago | parent | prev | next [-]

Point taken but there isnt any western providers there yet. Power is cheaper in china.

3uler 2 days ago | parent | next [-]

These models are open and there are tons of western providers offering it at comparable rates.

NitpickLawyer 2 days ago | parent | prev | next [-]

As this is a new arch with tons of optimisations, it'll take some time for inference engines to support it properly, and we'll see more 3rd party providers offer it. Once that settles we'll have a median price for an optimised 1.6T model, and can "guesstimate" from there what the big labs can reasonably serve for the same price. But yeah, it's been said for a while that big labs are ok on API costs. The only unknown is if subscriptions were profitable or not. They've all been reducing the limits lately it seems.

ithkuil 2 days ago | parent [-]

Is there evidence that frontier models at anthropic, openai or google or whatnot are not using comparable optimizations to draw down their coats and that their markup is just higher because they can?

persedes 2 days ago | parent | prev [-]

not soooo much though. It's heavily subsidized for residential consumption, but industrial power rates are almost comparable to the US (depends on the state you go to etc).

ting0 2 days ago | parent | prev | next [-]

They don't make sense, they're a lie that these AI companies keep spamming using bots so that useful idiots perpetuate it, so that they can keep draining us of money. Straight out of the Anthropic handbook. They've always been cheap to run. I wouldn't be surprised if Anthropic is running for <$1 for 1M/tok.

dminik 2 days ago | parent | prev | next [-]

I mean, not one "bleeding edge" lab has stated they are profitable. They don't publish financials aside from revenue. And in Anthropic's case, they fuck with pricing every week. Clearly something is wrong here.

npn 2 days ago | parent [-]

you know, if you don't have to pay insane salary for your top engineers, and don't have to pay billions for internet shills to control the narrative, then all of the labs will be insane profitable.

crazylogger 2 days ago | parent | prev | next [-]

I haven't seen anyone claiming that API prices are subsidized.

At some point (from the very beginning till ~2025Q4) Claude Code's usage limit was so generous that you can get roughly $10~20 (API-price-equivalent) worth of usage out of a $20/mo Pro plan each day (2 * 5h window) - and for good reason, because LLM agentic coding is extremely token-heavy, people simply wouldn't return to Claude Code for the second time if provided usage wasn't generous or every prompt costs you $1. And then Codex started trying to poach Claude Code users by offering even greater limits and constantly resetting everyone's limit in recent months. The API price would have to be 30x operating cost to make this not a subsidy. That would be an extraordinary claim.

nl 2 days ago | parent | next [-]

The claim that APIs are subsidized is very common.

eg:

Token prices are significantly subsidized and anyone that does any serious work with AI can tell you this.

https://news.ycombinator.com/item?id=47684887

(the claims don't make any sense, but they are widely held)

vessenes 2 days ago | parent [-]

I’ll note that it’s common and dangerous, in that there’s a generation of engineers who are at risk of leading each-other astray as to the economics and therefore probability distribution of outcomes for some firms that will massively impact their careers.

I think I understand the major reasons for this meme, but I find it really worrying; there were lots of incorrect ‘it’s a bubble’ conversations here in 2012-2015, but I don’t think they had the pervasive nature and “obvious” conclusion that a whole generation of engineering talent should just, you know, leave.

Meanwhile I am hearing rational economic modeling from the companies selling inference; Jensen, (a polished promoter, I grant you) says it really well — token value is increasing radically, in that new models -> better quality, and therefore revenues and utilization are increasing, and therefore contrary to the popular financial and techbro modeling of 2023, things like A100s still cost quite a lot whether hourly or to purchase. (!) Basically the economic value is so strong that it has actually radically extended the life of hardware.

I just hate to imagine like half of the world’s (or US’s) engineering talent quitting, spending ten years afraid, or wrongly convinced of some ‘inevitable’ market outcome. Feels like it will be bad for people’s personal lives, and bad for progress simultaneously.

mike_hearn 2 days ago | parent [-]

People shouldn't be quitting the industry, agreed. There's plenty of work to do even with AI assistance.

But how is that a counterpoint to tokens being subsidized? They obviously are subsidized, this just isn't arguable at all. The claims in the linked post make perfect sense. If they weren't subsidized the investors in AI labs would all be minting money instead of burning it.

It doesn't matter if token value is increasing. What matters is how fast it increases relative to the price increases, the repayments on the debt loads and other things we can't really know here on this forum.

Every attempt I've seen to argue this fact away is merely playing with numbers e.g. excluding every cost except inf hardware+energy, even though labs are always training and have large costs outside of compute. This might or might not be a good way to predict the future of these orgs, but it doesn't help anyone argue inference is profitable today (because inference is literally the only thing OpenAI/Anthropic sell and they lose money).

The whole computing industry is in a super weird place right now that feels temporary, like Wile E. Coyote spinning his legs suspended in mid air. Until the economics of the AI industry stop being driven by FOMO and weird, hard to interpret quasi-religious or geopolitical motivations, it's impossible to make accurate predictions about what the impact on software jobs will be. Historically a tech like this would have started at super-high prices and the token cost would have gradually fallen over a period of decades, giving people plenty of time to adapt. Look at the cost of flying, desktop computers, mobile phones, etc. AI is attempting to short circuit that normal technological path and pack decades into years by convincing capital holders that they have no choice but to "invest" because it'll be a winner-takes-all repeat of web search and social media. Yet it's not shaping up that way.

nl a day ago | parent | next [-]

> But how is that a counterpoint to tokens being subsidized? They obviously are subsidized, this just isn't arguable at all.

Why would Microsoft subsidize Anthropic's models when they serve the Claude model on Azure? They charge the same price as Anthropic. They aren't an investor in Anthropic.

There are numerous independent model serving companies that are clearly profitable serving non-Frontier models (Kimi K2.5 etc). It's easy to work out the raw costs of B200 GPUs, and then see what you need to charge for an API and see they make money.

The frontier labs charge a lot more than these companies.

The frontier labs have said they are profitable on inference.

Most people believe that training (and maybe subscriptions for some users) is where they lose money. Why do you think otherwise?

mike_hearn a day ago | parent [-]

Who says it's MS subsidizing those prices and not Anthropic themselves? Just because someone rehosts a model doesn't imply they get to set whatever price levels they want.

I don't think otherwise, I just think it's meaningless to differentiate between training and inference. What the frontier labs sell is inference. They can't just exclude costs required to engage in that business unless they plan a pivot to just serving Chinese models in a commodified market.

Yes, tokens for random no-name firms serving Kimi K2 probably do make money, although even there it's unclear because so many datacenters and GPU purchases have been made on credit etc. And if we assume that's sustainable forever then you can assume training/staffing costs should be subsidized to zero and say sure, token serving is profitable in that situation. But we were discussing the top labs.

vessenes 8 hours ago | parent | prev [-]

Hi Mike! Long time - super nice to see your name in my HN feed.

I’ll fight you on profit. The major labs are super profitable. If you replace “profitable today” with “cashflow positive today” then I think you’re correct. They are clearly not cashflow positive today. However, they are absolutely profitable, and when people confuse those I think it can be dangerous.

Consider a series of companies, let’s call these companies “Claude 1, Inc”, “Claude 2, Inc”, “Claude 3, Inc”, “Claude 4, Inc”.

In each company let’s keep track of the following:

* The pro-rata hardware and energy costs the company used during training. So, for instance, if a cluster is going to “last” 5 years, and we used it for 2, and the cluster cost $1 billion to build and provision and pay for 5 years of energy usage, we would charge $200mm.

* The R&D expenses like salary and so on

* The inference costs of every use of that company’s model.

* The revenue acquired in exchange for use of that model.

I propose first that I haven’t hidden any costs or double counted any revenue or anything - this is a full, fair assessment of the costs and likewise the revenue earned. I propose second that if you go to the end of the company’s final period then “profitability” in this case equals “cashflow”, so we can talk about either without talking past each-other. Third, I propose - if you add up all the costs and expenses of Claude 1 - 4, Inc, you’d have the full P&L of Anthropic, up to any training done on Claude 5.

I will now repeat a statement made publicly and repeatedly by Dario (and Sam in a slightly more cagey way): every single one of those “companies” (fully loaded models) has turned a profit so far. Put another way, it has, repeatedly, been a very good financial decision to train a model, and then sell inference of that model.

Why are the frontier companies spending cash? Simple - as each new model comes out, it’s quickly apparent that the new model will pay, and so increased training costs are incurred before that model has ended its useful life. Due to scaling activity, each new run costs some multiple of the prior run. Combining the overlap and the scale up, these companies are cashflow negative. But they aren’t doing it in some weird race to spend a dollar to make $0.50. They’re spending a dollar to make like $6 a year for a year or two.

If you see this, most of the ‘bubble’ (and implied massive crash) forecasts don’t seem to have any basis in reality from my perspective.

Frontier lab models are fucking great earners: 60%+ inference margins. (Public statements by said CEOs. Lateral proof: similar sized open models available for inference at 1/8 to 1/10 price on openrouter. Ergo - closed model margins are high). These earnings are real dollars, hard cash. Maybe the datacenters are in a bubble? After all, there’s a lot of debt getting laid on to do datacenter buildouts.

Datacenter companies and hyperscalars are making money providing hosting to these frontier labs. Coreweave (former ETH miner!) and others are posting 70% profit margins against debt costs under 8%. These profits are again in hard dollars from the labs. So, maybe the hardware providers are in a bubble?

Nvidia is making 70%+ margins, consistently beating every earnings call, is spending like $6bn a quarter on R&D against $40+bn in share buybacks (made in cash). They are moving super fast, and they could still literally be spending another 7x their current R&D spend before going cashflow negative. So, maybe the foundries are in a bubble?

TSMC is showing 66% margins (record high), and cutting Apple’s allocation to a point where there are research warnings about it. Maybe the EUV lithography companies are in a bubble?

ASML is the most generous company in the world, and is showing 34% operating margin this year while providing the only machines that can make the chips that TSMC and others are selling.

This is all very real. To my eyes the possible negative financial outcomes that seem plausible are:

1 - scaling laws stop working (and/or models get ‘good enough’) and all of a sudden the new hotness we just spend our entire last 5 years revenue on isn’t any better.

2 - There’s some major exogenous shift in demand for tokens and datacenter utilization drops radically, leading to credit defaults.

The main things that would have to be true would be that these things would have to be industry wide before they were a problem, and they’d have to end up with demand at less than 1/6 or so of current forecasts before they caused some kind of cascading financial problem: until then we’d see coreweave breaking even, reworking its debt covenants, spending less on power (unused), spending less on power (over capacity = lower prices on power being used), etc. etc.

This is SUPER long already, but to close, I think it’s reasonable and interesting to talk about those scenarios - how likely is it that scaling stops working or that people are okay with what we’ve got (that is, token value stops increasing in a compute-unitized environment)? How likely is it that people stop buying tokens at all even if their utility is stable or growing?

Agreed we’re in a temporary transitional phase right now, but I think it’s to a radically new business model and economic order more than it is a prelude to a giant debt leveraged crash, Wile E. Coyote style.

dannyw 2 days ago | parent | prev [-]

Yeah, subscriptions used to be extraordinarily generous. I miss those days, but the reinvigoration of open weight models is super exciting.

I'm still playing with the new Qwen3.6 35B and impressed, now DeepSeek v4 drops; with both base and instruction-tuned weights? There goes my weekend :P

sekai 2 days ago | parent | prev | next [-]

> I’d like somebody to explain to me how the endless comments of "bleeding edge labs are subsidizing the inference at an insane rate" make sense in light of a humongous model like v4 pro being $4 per 1M. I’d bet even the subscriptions are profitable, much less the API prices.

One answer - Chinese Communist Party. They are being subsidized by the state.

2 days ago | parent | next [-]
[deleted]
lbreakjai 2 days ago | parent | prev [-]

When China does it it's communism. When companies in the west get massive tax cuts, rebates, incentives and subsidies, that's just supporting the captains of industry.

casey2 2 days ago | parent | prev [-]

It's the decades of performance doesn't matter SV/web culture. I'd be surprised if over 1% of OpenAI/Anthropic staff know how any non-toy computer system works.