Remix.run Logo
mike_hearn 3 hours ago

Hello, long time no see :)

I know the argument about how each individual model is profitable, as made by Anthropic, but I have a hard time believing it. This isn't congruous with what we actually see: losses seem to be driven by systematic underpricing.

An obvious example of this is Sora, recently killed because it generated $2.1M in lifetime revenue and reportedly cost anywhere between $1M-$15M per day in compute alone - that's excluding training costs. It's quite hard to find any examples in history of a product being subsidized to that kind of level. So when people say things like "inference is profitable" it's clear that they're handwaving away some details, because that was one case where inference was not only unprofitable but comically so. Maybe they mean it's profitable for specific workloads or for specific companies, but it's more likely that the argument is too general to encompass such details.

OK but what about pure text inference?

We know that workload is hopelessly unprofitable too. Anthropic just told us a few days ago: "When we launched Max a year ago it didn't include Claude Code, Cowork didn't exist, and agents that run for hours weren't a thing. Max was designed for heavy chat usage, that's it".

Claude Code already existed at that time, they just didn't anticipate people using it with Max, apparently? Odd decision. But "heavy chat usage" apparently costs at least $200/month, and that's assuming the Max plan wasn't already a loss leader at launch.

If we go back a year ago, we can find people making the same claim that inference is profitable. But now Anthropic are openly saying they mispriced a plan that costs hundreds of dollars a month because for coding workflows, it's too cheap. We knew this already, people had been pointing out the huge API/subscription price discrepancy for a long time, but it always led to these same debates about profitable inference.

So what kind of workload are people talking about when they say "inference" is profitable? It's not consumer video. It's not the subscription plans. Do they mean pure LLM API serving? If so and API tokens are profitable by some metric, so what? It counts for nothing in a bankruptcy court - spending all your profit from one SKU on subsidizing another isn't a justification for voiding your debts.

But there's another issue with this narrative that individual models are profitable. If true it'd mean the entire set of losses made by these labs in any given year are driven entirely by the cost of training the next model. In turn that implies that model training costs are scaling so fast that it's enough to not only completely wipe out an otherwise great business model but then go beyond that and drive it deeply into the red. And that moreover this problem has got massively worse with time. Training costs probably have gone up a lot, but have they really scaled superlinearly? The last I heard RLVR now consumes the same compute budget as the actual pretraining, but that would only 2x costs whereas to make this "each model is super profitable" claim work training would have to be far more than 2x more expensive than before. And if true, how comes the frontier models appear to be only a small way ahead of models trained by heavily compute-starved Chinese labs operating on a fraction of the budget? Where is all that money going???