Remix.run Logo
No, it doesn't cost Anthropic $5k per Claude Code user(martinalderson.com)
118 points by jnord 8 hours ago | 63 comments
hirako2000 an hour ago | parent | next [-]

> Qwen 3.5 397B-A17B is a good comparison

It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

That's why the difference between open router prices and those official providers isn't that different. Plus who knows what open routed providers do in term quantization. They may be getting 100x better efficiency, thus the competitive price.

That being said not all users max out their plan, so it's not like each user costs anthropic 5,000 USD. The hemoragy would be so brutal they would be out of business in months

jychang 26 minutes ago | parent | next [-]

That's a tautology. People think chinese models are 10x more efficient because they're 10x cheaper, and then you use that to claim that they're 10x more efficient.

Opus isn't that expensive to host. Look at Amazon Bedrock's t/s numbers for Opus 4.5 vs other chinese models. They're around the same order of magnitude- which means that Opus has roughly the same amount of active params as the chinese models.

Also, you can select BF16 or Q8 providers on openrouter.

lelanthran 23 minutes ago | parent | prev | next [-]

> That being said not all users max out their plan,

These are not cell phone plans which the average joe takes, they are plans purchased with the explicit goal of software development.

I would guess that 99 out of every 100 plans are purchased with the explicit goal of maxing them out.

serial_dev 11 minutes ago | parent [-]

I’m not maxing them out… I have issues that I need to fix, features I need to develop, and I have things I want to learn.

When I have a feeling that these tools will speed me up, I use them.

My client pays for a couple of these tools in an enterprise deal, and I suspect most of us on the team work like that.

If my goal was to max out every tool my client pays, I’d be working 24hrs a day and see no sunlight ever.

I guess it’s like the all you can eat buffet. Everybody eats a lot, but if you eat so much that you throw up and get sick, you are special.

simianwords 33 minutes ago | parent | prev [-]

>It is not. It's a terrible comparison. Qwen, deepseek and other Chinese models are known for their 10x or even better efficiency compared to Anthropic's.

I find it a good comparison because it is a good baseline since we have zero insider knowledge of Anthropic. They give me an idea that a certain size of a model has a certain cost associated.

I don't buy the 10x efficiency thing: they are just lagging behind the performance of current SOTA models. They perform much worse than the current models while also costing much less - exactly what I would expect. Current Qwen models perform as good as Sonnet 3 I think. 2 years later when Chinese models catchup with enough distillation attacks, they would be as good as Sonnet 4.6 and still be profitable.

eaglelamp an hour ago | parent | prev | next [-]

If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

Anthropic's models may be similar in parameter size to model's on open router, but none of the others are in the headlines nearly as much (especially recently) so the comparison is extremely flawed.

The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch based on gear count.

d1sxeyes an hour ago | parent | next [-]

But opportunity cost is not actual cost. “If everyone just kept paying but used our service less we would be more profitable” is true, but not in any meaningful way.

Are Anthropic currently unable to sell subscriptions because they don’t have capacity?

bob1029 11 minutes ago | parent | prev | next [-]

> If Anthropic's compute is fully saturated then the Claude code power users do represent an opportunity cost to Anthropic much closer to $5,000 then $500.

I think it's the other way around? Sparse use of GPU farms should be the more expensive thing. Full saturation means that we can exploit batching effects throughout.

KronisLV 39 minutes ago | parent | prev | next [-]

Don’t give them any ideas, please! I need my 100 USD subscription with generous Opus usage!

Aeolun an hour ago | parent | prev | next [-]

Opportunity cost is not the same thing as actual cost. They might have made more money if they were capable of selling the API instead of CC, but I would never tell my company to use CC all the time if I didn’t have a personal subscription.

eaglelamp an hour ago | parent [-]

You’re looking through the wrong end of the telescope. An investor is buying opportunity and it is a real cost to them.

NooneAtAll3 36 minutes ago | parent | prev | next [-]

> The argument in this article is like comparing the cost of a Rolex to a random brand of mechanical watch on gear count

I mean... rolex is overpriced brand whose cost to consumers is mainly just marketting in itself. Its production cost is nowhere close to selling price and looking at gears is fair way of evaluating that

YetAnotherNick 25 minutes ago | parent | prev [-]

You can rent the GPUs and everything needed to run the model. Opportunity cost is not a real cost here.

jeff_antseed 11 minutes ago | parent | prev | next [-]

the openrouter comparison is interesting because it shows what happens when you have actual supply-side competition. multiple providers, different quantizations, price competition. the spread between cheapest and priciest for the same model can be 3-5x.

anthropic doesn't have that. single provider, single pricing decision. whether or not $5k is accurate the more interesting question is what happens to inference pricing when the supply side is genuinely open. we're seeing hints of it with open router but its still intermediated

not saying this solves anthropic's cost problem, just that the "what does inference actually cost" question gets a lot more interesting when providers are competing directly

ymaws 2 hours ago | parent | prev | next [-]

How confident are you in the opus 4.6 model size? I've always assumed it was a beefier model with more active params that Qwen397B (17B active on the forward pass)

Bolwin an hour ago | parent | next [-]

Yeah that's a massive assumption they're making. I remember musk revealed Grok was multiple trillion parameters. I find it likely Opus is larger.

I'm sure Anthropic is making money off the API but I highly doubt it's 90% profit margins.

jychang 23 minutes ago | parent | next [-]

> I find it likely Opus is larger.

Unlikely. Amazon Bedrock serves Opus at 120tokens/sec.

If you want to estimate "the actual price to serve Opus", a good rough estimate is to find the price max(Deepseek, Qwen, Kimi, GLM) and multiply it by 2-3. That would be a pretty close guess to actual inference cost for Opus.

It's impossible for Opus to be something like 10x the active params as the chinese models. My guess is something around 50-100b active params, 800-1600b total params. I can be off by a factor of ~2, but I know I am not off by a factor of 10.

simianwords 18 minutes ago | parent [-]

Are you sure you can use tps as a proxy?

aurareturn an hour ago | parent | prev [-]

Anthropic CEO said 50%+ margins in an interview. I'm guessing 50 - 60% right now.

daemonologist 2 hours ago | parent | prev | next [-]

Even if it's larger, OpenRouter has DeepSeek v3.2 (685B/37B active) at $0.26/0.40 and Kimi K2.5 (1T/32B active) at $0.45/2.25 (mentioned in the post).

johndough an hour ago | parent [-]

Opus 4.6 likely has in the order of 100B active parameters. OpenRouter lists the following throughput for Google Vertex:

    42 tps for Claude Opus 4.6 https://openrouter.ai/anthropic/claude-opus-4.6
    143 tps for GLM 4.7 (32B active parameters) https://openrouter.ai/z-ai/glm-4.7
    70 tps for Llama 3.3 70B (dense model) https://openrouter.ai/meta-llama/llama-3.3-70b-instruct
For GLM 4.7, that makes 143 * 32B = 4576B parameters per second, and for Llama 3.3, we get 70 * 70B = 4900B, which makes sense since denser models are easier to optimize. As a lower bound, we get 4576B / 42 ≈ 109B active parameters for Opus 4.6. (This makes the assumption that all three models use the same number of bits per parameter and run on the same hardware.)
jychang 18 minutes ago | parent [-]

Yep, you can also get similar analysis from Amazon Bedrock, which serves Opus as well.

I'd say Opus is roughly 2x to 3x the price of the top Chinese models to serve, in reality.

codemog 2 hours ago | parent | prev [-]

Also curious if any experts can weigh in on this. I would guess in the 1 trillion to 2 trillion range.

Chamix an hour ago | parent [-]

Try 10s of trillions. These days everyone is running 4-bit at inference (the flagship feature of Blackwell+), with the big flagship models running on recently installed Nvidia 72gpu rubin clusters (and equivalent-ish world size for those rented Ironwood TPUs Anthropic also uses). Let's see, Vera Rubin racks come standard with 20 TB (Blackwell NVL72 with 10 TB) of unified memory, and NVFP4 fits 2 parameters per btye...

Of course, intense sparsification via MoE (and other techniques ;) ) lets total model size largely decouple from inference speed and cost (within the limit of world size via NVlink/TPU torrus caps)

So the real mystery, as always, is the actual parameter count of the activated head(s). You can do various speed benchmarks and TPS tracking across likely hardware fleets, and while an exact number is hard to compute, let me tell you, it is not 17B or anywhere in that particular OOM :)

Comparing Opus 4.6 or GPT 5.4 thinking or Gemini 3.1 pro to any sort Chinese model (on cost) is just totally disingenuous when China does NOT have Vera Rubin NVL72 GPUs or Ironwood V7 TPUs in any meaningful capacity, and is forced to target 8gpu Blackwell systems (and worse!) for deployment.

aurareturn 43 minutes ago | parent [-]

China is targeting H20 because that's all they were officially allowed to buy.

Chamix 34 minutes ago | parent [-]

I generally agree, back of the napkin math shows H20 cluster of 8gpu * 96gb = 768gb = 768B parameters on FP8 (no NVFP4 on Hopper), which lines up pretty nicely with the sizes of recent open source Chinese models.

However, I'd say its relatively well assumed in realpolitik land that Chinese labs managed to acquire plenty of H100/200 clusters and even meaningful numbers of B200 systems semi-illicitly before the regulations and anti-smuggling measures really started to crack down.

This does somewhat beg the question of how nicely the closed source variants, of undisclosed parameter counts, fit within the 1.1tb of H200 or 1.5tb of B200 systems.

aurareturn 3 minutes ago | parent [-]

They do not have enough H200 or Blackwell systems to server 1.6 billion people and the world so I doubt it's in any meaningful number.

n_u 2 hours ago | parent | prev | next [-]

Good article! Small suggestions:

1. It would be nice to define terms like RSI or at least link to a definition.

2. I found the graph difficult to read. It's a computer font that is made to look hand-drawn and it's a bit low resolution. With some googling I'm guessing the words in parentheses are the clouds the model is running on. You could make that a bit more clear.

z3ugma 3 hours ago | parent | prev | next [-]

This is such a well-written essay. Every line revealed the answer to the immediate question I had just thought of

lovecg 2 hours ago | parent [-]

I can’t get past all the LLM-isms. Do people really not care about AI-slopifying their writing? It’s like learning about bad kerning, you see it everywhere.

weird-eye-issue 2 hours ago | parent | next [-]

I think you're just hallucinating because this does not come across as an AI article

lovecg 2 hours ago | parent | next [-]

I see quite a few:

“what X actually is”

“the X reality check”

Overuse of “real” and “genuine”:

> The real story is actually in the article. … And the real issue for Cursor … They have real "brand awareness", and they are genuinely better than the cheaper open weights models - for now at least. It's a real conundrum for them.

> … - these are genuinely massive expenses that dwarf inference costs.

This style just screams “Claude” to me.

lelanthran 19 minutes ago | parent | prev | next [-]

> I think you're just hallucinating because this does not come across as an AI article

It has enough tells in the correct frequency for me to consider it more than 50% generated.

hansvm an hour ago | parent | prev | next [-]

It was almost certainly at least heavily edited with one. Ignoring the content, every single thing about the structure and style screams LLM.

NetOpWibby 2 hours ago | parent | prev [-]

Name checks out

rhubarbtree 15 minutes ago | parent | prev | next [-]

It is certainly very obvious a lot of the time. I wonder if we revisited the automated slop detection problem we’d be more successful now… it feels like there are a lot more tells and models have become more idiosyncratic.

152334H an hour ago | parent | prev | next [-]

People care, when they can tell.

Popular content is popular because it is above the threshold for average detection.

In a better world, platforms would empower defenders, by granting skilled human noticers flagging priority, and by adopting basic classifiers like Pangram.

Unfortunately, mainstream platforms have thus far not demonstrated strong interest in banning AI slop. This site in particular has actually taken moderation actions to unflag AI slop, in certain occasions...

Erem 2 hours ago | parent | prev [-]

I don’t see the usual tells in this essay

brianjeong 3 hours ago | parent | prev | next [-]

These margins are far greater than the ones Dario has indicated during many of his recent podcasts appearances.

skybrian 2 hours ago | parent [-]

What did he say?

aurareturn an hour ago | parent | prev | next [-]

By the way, one of the charts in the article shows that Opus 4.6 is 10x costlier than Kimi K2.5.

I thought there was no moat in AI? Even being 10x costlier, Anthropic still doesn't have enough compute to meet demand.

Those "AI has no moat" opinions are going to be so wrong so soon.

spiderice an hour ago | parent [-]

Claude Code Max obviously doesn't cost 10x more than Kimi. The article even confirms that you can get $5k worth of computer for $200 with Claude Code Max.

So no, Claude would not be getting NEARLY as much usage as it's currently getting if it weren't for the $100/$200 monthly subscription. You're comparing Kimi to the price that most people aren't paying.

scuff3d 31 minutes ago | parent | prev | next [-]

This article is hilariously flawed, and it takes all of 5 seconds of research to see why.

Alibaba is the primary comparison point made by the author, but it's a completely unsuitable comparison. Alibab is closer to AWS then Anthropic in terms of their business model. They make money selling infrastructure, not on inference. It's entirely possible they see inference as a loss leader, and are willing to offer it at cost or below to drive people into the platform.

We also have absolutely no idea if it's anywhere near comparable to Opus 4.6. The author is guessing.

So the articles primary argument is based on a comparison to a company who has an entirely different business model running a model that the author is just making wild guesses about.

simianwords 14 minutes ago | parent [-]

What? Aws is a good comparison if you want only infra level costs which is what the post is talking about.

hattmall 2 hours ago | parent | prev | next [-]

Is it fair to say the Open Router models aren't subsidized though? They make the case that companies on there are running a business, but there are free models, and companies with huge AI budgets that want to gather training data and show usage.

gmerc 3 hours ago | parent | prev | next [-]

Nobody gets RSI typing “iterate until tests pass”

arthurcolle 2 hours ago | parent | next [-]

Recursive self improvement and Repetitive Strain Injury being the same initialism is really funny to me

rs_rs_rs_rs_rs an hour ago | parent | prev [-]

Honest questions: have you never heard of a hyperbole before and are you on the spectum?

functionmouse 8 hours ago | parent | prev | next [-]

Was anyone under the impression that it does? Serious question. I've never heard that, personally.

versteegen an hour ago | parent | next [-]

Ed Zitron made that claim (in particular here: [1]). In the same article he admits he not a programmer, and had to ask someone else to try out Claude Code and ccusage for him. He doesn't have any understanding of how LLMs or caching works. But he's prominent because he's received leaked financial details for Anthropic and OpenAI, eg [2]

[1] https://www.wheresyoured.at/anthropic-is-bleeding-out/ [2] https://www.wheresyoured.at/costs/

simianwords 32 minutes ago | parent | prev | next [-]

You would be surprised because there are lots of posters here who think that the cost is so enormous that this whole industry is unviable.

crazygringo 3 hours ago | parent | prev | next [-]

I mean, the very first paragraph of TFA is describing who is under that impression. Literally the first sentence:

> My LinkedIn and Twitter feeds are full of screenshots from the recent Forbes article on Cursor claiming that Anthropic's $200/month Claude Code Max plan can consume $5,000 in compute.

fulafel an hour ago | parent [-]

That's claiming that worst case, a subscriber _can_ use that much. It's possible that's wrong too, but in any case a lot of services are built on the assumption that the average user doesn't max out the plan.

So the article's title is obviously sensationalized.

dimgl 3 hours ago | parent | prev [-]

Twitter.

beepbooptheory 2 hours ago | parent | prev | next [-]

Ok but so it does cost Cursor $5k per power-Cursor user?? Still seems pretty rough..

scriptsmith 2 hours ago | parent | next [-]

Yes, you could turn it around to say that using Anthropic models in Cursor, Copilot, Junie, etc. is 'subsidising' Claude Code users.

arthurcolle 2 hours ago | parent | prev | next [-]

$5 = $5

but $5 that I amortize over 7 years might end up being $1.7 maybe if I don't rapidly combust (supply chain risk)

unlimit 2 hours ago | parent | prev | next [-]

I wonder how they are defining a power user. How many tokens, what could be the size the code base?

dietr1ch 2 hours ago | parent [-]

The $5k power user is the one that consistently uses all input and output tokens available under the Max subscription

oefrha 2 hours ago | parent | prev [-]

No, to use $5k in Cursor you have to pay $5k.

fnord77 2 hours ago | parent | prev [-]

> I'm fairly confident the Forbes sources are confusing retail API prices with actual compute costs

Aren't they losing money on the retail API pricing, too?

> ... comparisons to artificially low priced Chinese providers...

Yeah, no this article does not pass the sniff test.

versteegen an hour ago | parent [-]

> Aren't they losing money on the retail API pricing, too?

No, they aren't, and probably neither is anyone else offering API pricing. And Anthropic's API margins may be higher than anyone else.

For example, DeepSeek released numbers showing that R1 was served at approximately "a cost profit margin of 545%" (meaning 82% of revenue is profit), see my comment https://news.ycombinator.com/item?id=46663852

bandrami 12 minutes ago | parent [-]

Weird that they're all looking for outside money then