Remix.run Logo
twoodfin 3 hours ago

From the limited perspective of software development, today’s models are well-worth their per-token cost.

This reads to me like Anthropic anticipating demand and making a commitment to acquire supply. Not unlike airlines committing to future jet fuel purchases, or Apple committing to future DRAM volume.

an0malous 3 hours ago | parent | next [-]

> From the limited perspective of software development, today’s models are well-worth their per-token cost.

At the current price or real price? Anthropic said a $200 subscription can cost them $5000 so the real price could be anywhere from 10-30x the current price.

RealityVoid 2 hours ago | parent | next [-]

No, that is probably one of the worst cases they probably saw. Most likely the subscription inference cost is much lower than you expect. If you look at costs for similar open models they are much lower than what you get by buying from anthropic, so that is the real cost basis I expect.

It's likely Amazon is making a fucking killing though.

SlinkyOnStairs 2 hours ago | parent | next [-]

While $5000 is a lot, the people who rack up close or just over a thousand "API equivalent cost" are pretty common.

> Most likely the subscription inference cost is much lower than you expect.

This is probably not true because they'd be screaming it off every rooftop were that the case.

Same deal with the API inference. Even the "profitable on inference" claim is sourced back to hearsay of informal statements made by OpenAI/Anthropic staff. No formal announcements, nothing remotely of the "You can trust what I'm saying, because if I'm lying the SEC will have my head" sort.

Yet making such statements would be invaluable. If Anthropic can demonstrate profitability before OpenAI, they could poach most of the funding. There's no reason to keep it a company secret.

And API inference is only part of the total costs, not even bringing in training and ongoing fine-tuning. If they're not even profitable on inference, how could they hope to be profitable overall.

nielsole 2 hours ago | parent | next [-]

I don't know about SEC rules but the anthropic CEO said they have a 50%+ margin on API pricing.

stackskipton an hour ago | parent | next [-]

SEC rules means CEO cannot lie or deliberately hide the cost of something.

50%+ Margin statements have basically been "We are making 50% on delivering it." This does not include ANY of the costs of getting to this point, training, scraping, datacenters, people and so forth.

They are basically saying "Oh yea, the cost of GAS in the car is only X so charging Y per mile is great margin" while ignoring maintenance, cost of acquiring the car and so forth.

SlinkyOnStairs an hour ago | parent | prev [-]

I'm going to be a dickhead for a moment here, apologies, there's no way to say this that isn't rude to you. This is still the same hearsay "In an interview, somewhere."

A bit of google searching later can get us a specific interview. https://www.dwarkesh.com/p/dario-amodei-2

> Let’s say half of your compute is for training and half of your compute is for inference. The inference has some gross margin that’s more than 50%.

But the context, the very previous sentence is:

> Think about it this way. Again, these are stylized facts. These numbers are not exact. I’m just trying to make a toy model here.

Here, Amodei is in effect using weasel words. He is not giving any actionable claims about Anthropics margins, merely plucking an arbitrary number. Why 50%? Is 50% reasonable? Is 50% accurate to the company? Those are all conclusions the listener draws, not Amodei.

> I don't know about SEC rules

The main premise is that, as a CEO, there are some regulations you are beholden to. You're not allowed to announce you've made a trillion dollar profit, sell all your stock, and then go "teehee just kidding". The SEC prosecute you for securities fraud if you do that stuff.

This makes such weasel words as earlier suspicious. Because the exact statement Amodei gives is not prosecutable. He's not saying anything about the company, just doing a little "toy model".

The degree to which it is intentional that this hearsay travels and is extrapolated from "Well he picked 50% because it's a reasonable figure, and because he's CEO, a reasonable figure would have to be a figure akin to what his company can achieve" into "Anthropic has 50% margin", that's up for debate. Maybe it is intentional, maybe Amodei is exactly the same kind of shitweasel as Altman is. Probably he's just a dumbass who runs his mouth in interviews and for whatever reason cannot issue the true number in an authoritative statement to dismiss this misconception.

Hence my original comment; If the real number were better than the hearsay rumours of the number, Amodei would immediately issue a correction; It'd be great for the company. Hell, even if 50% were about the margin, that'd be great! To promote that from mere hearsay to "we're profitable, go invest all your money" would also be huge. Really, any kind of margin at all would put him ahead of OpenAI.

But he doesn't issue a correction. He doesn't affirm the statement. Perhaps he has other reasons for that, but a rather big reason could be that the margin number is in fact pretty bad.

Now, the observant reader will note I am also using a weasel word there. I do not know whether the number is good or bad, your take away should be "it could be bad." Not "it is bad". Go pressure Amodei into giving us the real number.

dminik 38 minutes ago | parent [-]

Interesting. So the 50%+ number that's been floating about isn't even real. It's just an example.

redsocksfan45 2 hours ago | parent | prev [-]

[dead]

PunchyHamster 2 hours ago | parent | prev [-]

The "worst case" is probably someone just using their $200 account limits. So yeah, real cost is probably close to that

kiratp an hour ago | parent | prev [-]

At the full current retail API price.

Business buyers are paying API prices, not subscription

Disclosure: Work at Microsoft on AI

svnt an hour ago | parent | prev | next [-]

And receiving investment from their vendor in exchange? When this is done in established companies it is typically called a kickback and directed toward one person, but in this case the whole thing is so incestuous the kickback goes straight to the top.

twoodfin an hour ago | parent [-]

Is it crazy to imagine Anthropic can leverage short term cash flow now to build the models and products that will let them resell $100B in AWS infra with nice margins tomorrow?

If Amazon believes that story they’d be crazy not to invest.

svnt 44 minutes ago | parent [-]

Yes I understand why the agreement exists, but that does not remove the circularity.

sandworm101 3 hours ago | parent | prev [-]

But that per-token cost is a total joke. All these companies are fighting to build market share in some future dominated by one or two AI ecosystems. It is musical chairs until someone creates the one ring to rule them all. So they are charging token amounts just to claim revenue as they burn through investor dollars.

In short: per-token charges currently cover maybe 1% of the total costs in this field. To pay ongoing costs, and pay back investors, everyone will need to pay 100x or 1000x the current rates, likely for decades.

red_hare an hour ago | parent | next [-]

If that's true, it's very unsustainable.

Gemma-4 26B-A4B + M5 MacBook Pro + OpenCode isn't Claude Code _yet_, but it's good enough that if I were forced to use it I would be fine.

jcgrillo an hour ago | parent [-]

Yes, it's amazing how quickly so many tech companies have hitched their tooling to these big AI vendors seemingly without any thought towards whether they'll still exist a year or three or five from now. Insane behavior. To the (debatable!) extent that AI coding tools are useful at all wouldn't it be a hell of a lot smarter to self-host? At least that way you have some control over QoS, and a stable, predictable result... Or maybe nobody cares about that kind of thing anymore? What happened to basic business math in this industry?

matrik an hour ago | parent | prev | next [-]

I'm not sure this information is grounded, but I remember to have read somewhere the inference is indeed profitable. My personal experience is similar. Running 2x3090s draw 500-600W and you can locally run amazing models with a similar setup.

sandworm101 an hour ago | parent [-]

Running the model isnt the cost. Watts per token is the math they show investors. You also have to be constantly training new models, which currently needs more compute than servicing the customer base. You have to biuld datacenters, and possibly powerplants to feed them. You have to carry debts. And you will need to buy new GPUs/ram every few years to remain competative. The total business is vastly different than simple gpu math.

deaux 2 hours ago | parent | prev | next [-]

> In short: per-token charges currently cover maybe 1% of the total costs in this field

There are plenty of seemingly informed people saying the exact opposite, so that's a lot of confidence you're talking with. I have a hard time believing it when we know what open weights models cost to run. And sure, there's training costs, but again many say inference costs are already above training costs.

twoodfin 2 hours ago | parent | prev | next [-]

From the perspective of a deal like this, “total costs in the field” matter less than incremental cost per token served.

The unit economics for today’s frontier models should be great, and this suggests Anthropic believes they’ll get better.

postalrat 2 hours ago | parent | prev [-]

In a decade the cost of compute will be a tiny fraction of what it costs now. Specialized hardware will exist that will be cheap and efficient.

bitmasher9 an hour ago | parent [-]

The difference in the cost of compute between 2026 and 2036 won’t be nearly as large as the difference in the cost of compute between 2016 and 2026. Even at 2016 the slowdown in improvements was noticeable.

We might see a one time bump in inference when we move off GPUs onto more limited and efficient dedicated hardware, but the sustained fast pace of improvements are far behind us.

oceansky 4 minutes ago | parent | next [-]

Compute power improvement between 2016 and 2026 wasn't that impressive either. Moore's law is essentially dying.

postalrat 33 minutes ago | parent | prev [-]

I'm predicting now that there is a clear use-case for this tech that work will (and has) accelerate specialized hardware, software, models, etc that will run much more efficiently in 10 years. So that the real token costs will be a fraction of what they are now.