Remix.run Logo
827a 5 hours ago

> Zitron's numbers don't tell us the real cost of generating tokens but, subject to the assumption that the platforms are not subsidizing the token price, that means Anthropic is subsidizing their enterprise customers by up to 40 times, and OpenAI up to 70 times

Neither Anthropic nor OpenAI are subsidizing enterprise customers. Neither Anthropic nor OpenAI allow Business nor Enterprise customers access to the high value $200/mo plan. Both organizations have moved to a "cheaper plan per user + API Pricing after that" (e.g. $20/mo + usage). The $100/$200/mo plans are for individuals only (of course, many individuals use these plans at work, but that's beside the point; they aren't selling this plan to enterprises).

> SemiAnalysis also analyzed the platform's gross margins, implausibly assuming that tokens were priced at 4 times the cost of generating them and: With the current subsidies, all it takes for a user to have a gross margin of at best negative 25% is for them to use as little as 25% of their rate limit.

The article's source for this claim is not SemiAnalysis; its Zitron. But once you dig through his article, Zitron links to a SemiAnalysis tweet [1] where they, as the paragraph states, implausibly assume gross margins of 75% to come up with their weird analysis of the subscription plans. Citing this for anything is weird, because afaik that 75% number is a total shot in the dark. We have no clue what their margins are. My take is that the only reason that 75% number is implausible is because it may underestimate the inference margins of Ant/OAI's API pricing.

[1] https://x.com/SemiAnalysis_/status/2064815045767213400?ref=w...

bayarearefugee 5 hours ago | parent | next [-]

> it may underestimate the inference margins of Ant/OAI's API pricing.

If true then why are neither Anthropic or OpenAI dropping their API pricing to gain market share when both are clearly doing all sorts of political and PR maneuvering to compete in a cutthroat market?

Since they aren't dropping the API usage prices (and are in fact raising them in a lot of subtle ways) then one of these options almost has to be true: they are still subsidizing inference, training costs are so ridiculously high that they need to make huge profits off inference or collapse in on themselves, or they are price fixing.

827a an hour ago | parent | next [-]

Company-wide their margins are trash (probably negative). They need as much inference margin as they can get to afford the massive training runs. It is likely that we'll see GPT-5.6 reduce API pricing to compete against Anthropic, but whether Anthropic feels they need to reduce their prices is anyone's guess.

CuriouslyC 5 hours ago | parent | prev | next [-]

The training costs are very likely the reason. Dario has talked about how each individual model is profitable, but how the expenditure training the next generation of models makes it look like they're not profitable at any given moment in time, and I believe he's being honest about that.

The market for open weight model hosting gives you an idea of the profitable price floor, it's pretty clear there's markup baked into OAI/Anthropic's APIs.

orangecat 4 hours ago | parent | prev [-]

If true then why are neither Anthropic or OpenAI dropping their API pricing

They are? In the before times of 2025, Opus 4.1 was $75 per million tokens. Opus 4.8 is $25, and Fable is/was $50.

minraws 5 hours ago | parent | prev | next [-]

Given my experience with hosting these models at scale, working and optimizing load, I don't think the margins are nearly as high as 75% if the models are as big as people often claim.

Only reason deepseek is so cheap is because well I don't know, but actual pricing should be around their initial price which was 4x, at that price you have a healthy 25-50% margin based on occupancy, given the deepseek v4 is a very sparse moe model.

GLM 5.2 for example doesn't have more than 30-50% margins that's assuming old pricing for GPUs, current inflated GPU pricing well I am certain the margins must be lower. Ofc you can host for cheaper with quantization, and if you have very consistent capacity/utilization, which is not the norm with AI workloads.

Overall for large models like GPT 5.5 or Opus there must be healthier margins of around 50-70% assuming GPU pricing didn't increase for these companies. Even if it did 30-40% margin should be possible, even in worst case assuming all GPU they had saw a jump in pricing.

For smaller models it's hard to say, I would guess 20% but these models might be much smaller than I suspect, then it might be double that.

Note the issue is less intelligent tokens don't linearly scale down in memory usage, which is the biggest pain point of serving models. Context sizes have fucked us all.

Also anyone claiming OAI makes less margins on APIs or stuff might be wrong given they are on much lower context size, 1M context definitely is a lot more expensive to serve especially with smaller models like sonnet.

andrekandre 4 hours ago | parent | prev | next [-]

  > Neither Anthropic nor OpenAI allow Business nor Enterprise customers access to the high value $200/mo plan. 
they may not "allow" it, but i've seen first hand enterprises encourage employees to use these accounts personally and get reimbursed later to avoid pay-as-you-go w/limits pricing for users who do tokenmaxing as a cost control measure...
surgical_fire an hour ago | parent | prev [-]

> Neither Anthropic nor OpenAI are subsidizing enterprise customers

> Neither Anthropic nor OpenAI allow Business nor Enterprise customers access to the high value $200/mo plan. Both organizations have moved to a "cheaper plan per user + API Pricing after that" (e.g. $20/mo + usage).

I actually think that even the API pricing of OpenAI and Anthropic are still subsidized. I don't think they make any profit on inference when you factor in depreciation. They likely still operate that at a loss.

It's no coincidence that Anthropic only had a "profitable" EBITDA with not paying Elon for compute for a bit of time, and when EBITDA curiously ignores depreciation. Models grow stale over time, as knowledge is not static.

HDThoreaun an hour ago | parent [-]

I don’t see any reason to believe this. If you compare their api prices to open router you see they charge 10x as much. Sure their models are probably bigger, but they have economy of scale on their side, and I doubt their models are 10x bigger.

surgical_fire an hour ago | parent [-]

It's impossible to tell at this point. I have no idea how much of their compute is also subsidized through deals they have with the hyperscalers, etc.

It's irrelevant how big their models may or may not be. Depreciation needs to be taken into account, so does actual compute expenses. Training those models is not cheap, and you will never reach a point where a model is "final". You will always need to train the next one.

Eventually the bill has to be paid. Money and resources are finite still.

HDThoreaun 39 minutes ago | parent [-]

Well the third party operators on open router are assuredly operating at a profit, including depreciation. The only reason they’d be profitable at 1/10 the price of the labs while the labs aren’t profitable is if inference costs the labs 10x as much per token