Remix.run Logo
paxys 4 hours ago

I don't think it's a secret that AI companies are losing a ton of money on subscription plans. Hence the stricter rate limits, new $200+ plans, push towards advertising etc. The real money is in per-token billing via the API (and large companies having enough AI FOMO that they blindly pay the enormous invoices every month).

mirzap 3 hours ago | parent | next [-]

They are not losing money on subscription plans. Inference is very cheap - just a few dollars per million tokens. What they’re trying to do is bundle R&D costs with inference so they can fund the training of the next generation of models.

Banning third-party tools has nothing to do with rate limits. They’re trying to position themselves as the Apple of AI companies -a walled garden. They may soon discover that screwing developers is not a good strategy.

They are not 10× better than Codex; on the contrary, in my opinion Codex produces much better code. Even Kimi K2.5 is a very capable model I find on par with Sonnet at least, very close to Opus. Forcing people to use ONLY a broken Claude Code UX with a subscription only ensures they loose advantage they had.

rjh29 2 hours ago | parent | next [-]

> "just a few dollars per million tokens"

Google AI Pro is like $15/month for practically unlimited Pro requests, each of which take million tokens of context (and then also perform thinking, free Google search for grounding, inline image generation if needed). This includes Gemini CLI, Gemini Code Assist (VS Code), the main chatbot, and a bunch of other vibe-coding projects which have their own rate limits or no rate limits at all.

It's crazy to think this is sustainable. It'll be like Xbox Game Pass - start at £5/month to hook people in and before you know it it's £20/month and has nowhere near as many games.

harrall an hour ago | parent [-]

OpenAI only released ChatGPT 4 years ago but…

Google has made custom AI chips for 11 years — since 2015 — and inference costs them 2-5x less than it does for every other competitor.

The landmark paper that invented the techniques behind ChatGPT, Claude and modern AI was also published by Google scientists 9 years ago.

That’s probably how they can afford it.

illiac786 10 minutes ago | parent [-]

I agree that the TPUs are one of the things that are underestimated (based on my personal reading of HN).

Google already has a huge competitive advantage because they have more data than anyone else, bundle Gemini in each android to siphon even more data, and the android platform. The TPUs truly make me believe there actually could be a sort of monopoly on LLMs in the end, even though there are so many good models with open weights, so little (technical) reasons to create software that only integrates with Gemini, etc.

Google will have a lion‘s share of inferring I believe. OpenAI and Claude will have a very hard time fighting this.

gbear605 an hour ago | parent | prev | next [-]

I’m not familiar with the Claude Code subscription, but with Codex I’m able to use millions of tokens per day on the $200/mo plan. My rough estimate was that if I were API billing, it would cost about $50/day, or $1200/mo. So either the API has a 6x profit margin on inference, the subscription is a loss leader, or they just rely on most people not to go anywhere near the usage caps.

trymas 18 minutes ago | parent [-]

I use GLM lite subscription for personal use. It is advertised as 3x claude code pro (the 20$ one).

5h allowance is somewhere between 50M-100M tokens from what I can tell.

On 200$ claude code plan you should be burning hundreds of millions of token per day to make anthropic hurt.

IMHO subscription plans are totally banking on many users underusing them. Also LLM providers dont like to say exact numbers (how much you get , etc)

MikeNotThePope 2 hours ago | parent | prev | next [-]

I wonder how many people have a subscription and don’t fully utilize it. That’s free money for them, too.

hhh an hour ago | parent | prev | next [-]

What walled garden man? There’s like four major API providers for Anthropic.

andersmurphy 43 minutes ago | parent | prev | next [-]

Except all those GPUs running inference need to be replaced every 2 years.

phyrex 25 minutes ago | parent [-]

Why?

xyzsparetimexyz 14 minutes ago | parent [-]

They wear down being run at 100% all the time. Support slowly drops off, the architecture and even the rack format become deprecated.

mvdtnz 24 minutes ago | parent | prev | next [-]

"They're not losing money on subscriptions, it's just their revenue is smaller than their costs". Weird take.

carderne 19 minutes ago | parent [-]

It means the marginal cost to sell another subscription is lower than what they sell it for. I don't know if that's true, but it seems plausible.

KingMob 2 hours ago | parent | prev [-]

> They are not losing money on subscription plans. Inference is very cheap - just a few dollars per million tokens. What they’re trying to do is bundle R&D costs with inference so they can fund the training of the next generation of models.

You've described every R&D company ever.

"Synthesizing drugs is cheap - just a few dollars per million pills. They're trying to bundle pharmaceutical research costs... etc."

There's plenty of legit criticisms of this business model and Anthropic, but pointing out that R&D companies sink money into research and then charge more than the marginal cost for the final product, isn't one of them.

mirzap 2 hours ago | parent [-]

I’m not saying charging above marginal cost to fund R&D is weird. That’s how every R&D company works.

My point was simpler: they’re almost certainly not losing money on subscriptions because of inference. Inference is relatively cheap. And of course the big cost is training and ongoing R&D.

The real issue is the market they’re in. They’re competing with companies like Kimi and DeepSeek that also spend heavily on R&D but release strong models openly. That means anyone can run inference and customers can use it without paying for bundled research costs.

Training frontier models takes months, costs billions, and the model is outdated in six months. I just don’t see how a closed, subscription-only model reliably covers that in the long run, especially if you’re tightening ecosystem access at the same time.

sambull 4 hours ago | parent | prev | next [-]

The secret is there is no path on making that back.

stingrae an hour ago | parent | next [-]

the path is by charging just a bit less than the salary of the engineers they are replacing.

JimmaDaRustla 4 hours ago | parent | prev | next [-]

My crude metaphor to explain to my family is gasoline has just been invented and we're all being lent Bentley's to get us addicted to driving everywhere. Eventually we won't be given free Bentley's, and someone is going to be holding the bag when the infinite money machine finally has a hiccup. The tech giants are hoping their gasoline is the one that we all crave when we're left depending on driving everywhere and the costs go soaring.

eru 4 hours ago | parent | next [-]

Why? Computers and anything computer related have historically been dropping in prices like crazy year after year (with only very occasional hiccups). What makes you think this will stop now?

shykes 3 hours ago | parent | next [-]

Commodity hardware and software will continue to drop in price.

Enterprise products with sufficient market share and "stickiness", will not.

For historical precedent, see the commercial practices of Oracle, Microsoft, Vmware, Salesforce, at the height of their power.

hugmynutus 2 hours ago | parent [-]

> Commodity hardware and software will continue to drop in price.

The software is free (citation: Cuda, nvcc, llvm, olama/llama cpp, linux, etc)

The hardware is *not* getting cheaper (unless we're talking a 5+ year time) as most manufacturers are signaling the current shortages will continue ~24 months.

denimnerd42 an hour ago | parent [-]

GB300 NVL72 is 50% more expensive than GB200 I've heard.

Wobbles42 an hour ago | parent | prev | next [-]

It has stopped. Demand is now rising faster than supply in memory, storage and GPUs.

We see vendors reducing memory in new smart phones in 2026 vs 2025 for example.

At least for the moment falling consumer tech hardware prices are over.

Ekaros an hour ago | parent | prev | next [-]

On consumer side looking at a few past generations I question that. I would guess that we are nearing some sort of plateau there or already on it. There was inflation, but still not even considering RAM prices from last jump gains relative to cost were not that massive.

walterbell 3 hours ago | parent | prev | next [-]

Recent price trends for DRAM, SSDs, hard drives?

aftbit 3 hours ago | parent [-]

Short term squeeze, because building capacity takes time and real funding. The component manufacturers have been here before. Booms rarely last long enough to justify a build-out. If AI demand turns out to be sustained, the market will eventually adapt by building supply, and prices will drop. If AI demand turns out to be transient, demand will drop, and prices will drop.

adrianN 3 hours ago | parent | prev | next [-]

Cars have also been dropping in price.

Wobbles42 an hour ago | parent | next [-]

And knives apparently.

I recently encountered this randomly -- knives are apparently one of the few products that nearly every household has needed since antiquity, and they have changed fairly little since the bronze age, so they are used by economists as a benchmark that can span centuries.

Source: it was an aside in a random economics conversation with charGPT (grain of salt?).

There is no practical upshot here, but I thought it was cool.

sciencejerk 3 hours ago | parent | prev [-]

Evidence for this claim?

adrianN 2 hours ago | parent [-]

A few generations ago almost nobody could afford a car, now many low income families afford two.

CamperBob2 3 hours ago | parent | prev [-]

In the GP's analogy, the Bentley can be rented for $3/day, but if you want to purchase it outright, it will cost you $3,000,000.

Despite the high price, the Bentley factory is running 24/7 and still behind schedule due to orders placed by the rental-car company, who has nearly-infinite money.

echelon 3 hours ago | parent | prev | next [-]

I like this analogy.

I also think we're, as ICs, being given Bentleys meanwhile they're trying to invent Waymos to put us all out of work.

Humans are the cost center in their world model.

rizky05 3 hours ago | parent | prev [-]

[dead]

snihalani 3 hours ago | parent | prev [-]

how do I understand what is the sustainable pricing?

fulafel 3 hours ago | parent | prev | next [-]

Depends on how you do the accounting. Are you counting inference costs or are you amortizing next gen model dev costs. "Inference is profitable" is oft repeated and rarely challenged. Most subscription users are low intensity users after all.

Someone1234 4 hours ago | parent | prev | next [-]

I agree; unfortunately when I brought up that they're losing before I get jumped on demanding me to "prove it" and I guess pointing at their balance sheets isn't good enough.

mattas 4 hours ago | parent | prev | next [-]

The question I have: how much are they _also_ losing on per-token billing?

imachine1980_ 3 hours ago | parent [-]

From what I understand, they make money per-token billing. Not enough for how much it costs to train, not accounting for marketing, subscription services, and research for new models, but if they are used, they lose less money.

Finance 101 tldr explanation: The contribution margin (= price per token -variable cost per token ) this is positive

Profit (= contribution margin x cuantity- fix cost)

andersmurphy 32 minutes ago | parent [-]

Do they make enough to replace their GPUs in two years?

dcre 3 hours ago | parent | prev | next [-]

Why do you think they're losing money on subscriptions?

andersmurphy 34 minutes ago | parent | next [-]

Does a GPU doing inference server enough customers for long enough to bring in enough revenue to pay for a new replacement GPU in two years (and the power/running cost of the GPU + infrastructure). That's the question you need to be asking.

If the answer is not yes, then they are making money on inference. If the answer is no, the market is going to have a bad time.

Yossarrian22 3 hours ago | parent | prev [-]

Because they're not saying they are making a profit

mikeg8 3 hours ago | parent [-]

That doesn’t mean that the subscription itself is losing money. The margin on the subscription could be fine, but by using that margin to R&D the next model, the org may still be intentionally unprofitable. It’s their investment/growth strategy, not an indictment of their pricing strategy.

Wobbles42 an hour ago | parent [-]

They have investors that paid for training of these models too. It could be argued that R&D for the next generation is a separate issue, but they need to provide a return on the R&D in this generation to stay in business.

croes 3 hours ago | parent | prev | next [-]

But why does it matter which program you use to consume the tokens?

The sounds like a confession that claude code is somewhat wasteful at token use.

airstrike 3 hours ago | parent [-]

No, it's a confession they have no moat other than trying to hold onto the best model for a given use case.

I find that competitive edge unlikely to last meaningfully in the long term, but this is still a contrarian view.

More recently, people have started to wise up to the view that the value is in the application layer

https://www.iconiqcapital.com/growth/reports/2026-state-of-a...

hannasm 3 hours ago | parent | prev [-]

Honestly I think I am already sold on AI, who is the first company that is going to show us all how much it really costs and start enshitification? First to market wins right?