It's not clear to me that each new generation of models is going to be "that" much better vs cost.

Anecdotally moving from model to model I'm not seeing huge changes in many use cases. I can just pick an older model and often I can't tell the difference...

Video seems to be moving forward fast from what I can tell, but it sounds like the back end cost of compute there is skyrocketing with it raising other questions.

▲

renegade-otter 3 days ago | parent | next [-]

We do seem to be hitting the top of the curve of diminishing returns. Forget AGI - they need a performance breakthrough in order to stop shoveling money into this cash furnace.

▲

reissbaker 3 days ago | parent | next [-]

According to Dario, each model line has generally been profitable: i.e. $200MM to train a model that makes $1B in profit over its lifetime. But, since each model has been more and more expensive to train, they keep needing to raise more money to train the next generation of model, and the company balance sheet looks negative: i.e. they spent more this year than last (since the training cost for model N+1 is higher), and the model this year made less money this year than they spent (even if the model generation itself was profitable, model N isn't profitable enough to train model N+1 without raising — and spending — more money).

That's still a pretty good deal for an investor: if I give you $15B, you will probably make a lot more than $15B with it. But it does raise questions about when it will simply become infeasible to train the subsequent model generation due to the costs going up so much (even if, in all likelihood, that model would eventually turn a profit).

▲

dom96 3 days ago | parent | next [-]

> if I give you $15B, you will probably make a lot more than $15B with it

"probably" is the key word here, this feels like a ponzi scheme to me. What happens when the next model isn't a big enough jump over the last one to repay the investment?

It seems like this already happened with GPT-5. They've hit a wall, so how can they be confident enough to invest ever more money into this?

▲

bcrosby95 3 days ago | parent [-]

I think you're really bending over backwards to make this company seem non viable.

If model training has truly turned out to be profitable at the end of each cycle, then this company is going to make money hand over fist, and investing money to out compete the competition is the right thing to do.

Most mega corps started out wildly unprofitable due to investing into the core business... until they aren't. It's almost as if people forget the days of Facebook being seen as continually unprofitable. This is how basically all huge tech companies you know today started.

▲

serf 2 days ago | parent | next [-]

>I think you're really bending over backwards to make this company seem non viable.

Having experienced Anthropic as a customer, I have a hard time thinking that their inevitable failure (something i'd bet on) will be model/capability-based, that's how bad they suck at every other customer-facing metric.

You think Amazon is frustrating to deal with? Get into a CSR-chat-loop with an uncaring LLM followed up on by an uncaring CSR.

My minimum response time with their customer service is 14 days -- 2 weeks -- while paying 200usd a month.

An LLM could be 'The Great Kreskin' and I would still try to avoid paying for that level of abuse.

▲

sbarre 2 days ago | parent | next [-]

Maybe you don't want to share, but I'm scratching my head trying to think of something I would need to talk to Anthropic's customer service about that would be urgent and un-straightfoward enough to frustrate me to the point of using the term "abuse"..

▲

babelfish 2 days ago | parent [-]

Particularly since they seem to be complaining about service as a consumer, rather than an enterprise...

	▲	2 days ago \| parent [-]
		[deleted]

▲

StephenHerlihyy 2 days ago | parent | prev [-]

What's fun is that I have had Anthropic's AI support give me blatantly false information. It tried to tell me that I could get a full year's worth of Claude Max for only $200 dollars. When I asked if that was true it quickly backtracked and acknowledged it's mistake. I figure someone more litigious will eventually try to capitalize.

	▲	nielsbot 2 days ago \| parent [-]
		"Air Canada must honor refund policy invented by airline’s chatbot" https://arstechnica.com/tech-policy/2024/02/air-canada-must-...

▲

ricardobayes 2 days ago | parent | prev | next [-]

It's an interesting case. IMO LLMs are not a product in the classical sense, companies like Anthropic are basically doing "basic research" so others can build products on top of it. Perhaps Anthropic will charge a royalty on the API usage. I personally don't think you can earn billions selling $500 subscriptions. This has been shown by the SaaS industry. But it is yet to be seen whether the wider industry will accept such royalty model. It would be akin to Kodak charging filmmakers based on the success of the movie. Somehow AI companies will need to build a monetization pipeline that will earn them a small amount of money "with every gulp", if we are using a soft drink analogy.

▲

Barbing 2 days ago | parent | prev [-]

Thoughts on Ed Zitron’s pessimism?

“There Is No AI Revolution” - Feb ‘25:

https://www.wheresyoured.at/wheres-the-money/

▲

reissbaker 21 hours ago | parent [-]

Ed Zitron plainly has no idea what he's talking about. For example:

Putting aside the hype and bluster, OpenAI — as with all generative AI model developers — loses money on every single prompt and output. Its products do not scale like traditional software, in that the more users it gets, the more expensive its services are to run because its models are so compute-intensive.

While OpenAI's numbers aren't public, this seems very unlikely. Given open-source models can be profitably run for cents per million input tokens at FP8 — and OpenAI is already training (and thus certainly running) in FP4 — even if the closed-source models are many times bigger than the largest open-source models, OpenAI is still making money hand over fist on inference. The GPT-5 API costs $1.25/million input tokens: that's a lot more than it takes in compute to run it. And unless you're using the API, it's incredibly unlikely you're burning through millions of tokens in a week... And yet, subscribers to the chat UI are paying $20/month (at minimum!), which is much higher than a few million tokens a week cost.

Ed Zitron repeats his claim many, many, excruciatingly many times throughout the article, and it seems quite central to the point he's trying to make. But he's wrong, and wrong enough that I think you should doubt that he knows much about what he's talking about.

(His entire blog seems to be a series of anti-tech screeds, so in general I'm pretty dubious he has deep insight into much of anything in the industry. But he quite obviously doesn't know about the economics of LLM inference.)

	▲	Barbing 14 hours ago \| parent [-]
		Thank you for your analysis!

▲

mandevil 3 days ago | parent | prev | next [-]

I mean, this is how semiconductors have worked forever. Every new generation of fab costs ~2x what the previous generation did, and you need to build a new fab ever couple of years. But (if you could keep the order book full for the fab) it would make a lot of money over its lifetime, and you still needed to borrow/raise even more to build the next generation of fab. And if you were wrong about demand .... you got into a really big bust, which is also characteristic of the semiconductor industry.

This was the power of Moore's Law, it gave the semiconductor engineers an argument they could use to convince the money-guys to let them raise the capital to build the next fab- see, it's right here in this chart, it says that if we don't do it our competitors will, because this chart shows that it is inevitable. Moore's Law had more of a financial impact than a technological one.

And now we're down to a point where only TSMC is for sure going through with the next fab (as a rough estimate of cost, think 40 billion dollars)- Samsung and Intel are both hemming and hawing and trying to get others to go in with them, because that is an awful lot of money to get the next frontier node. Is Apple (and Nvidia, AMZ, Google, etc.) willing to pay the costs (in delivery delays, higher costs, etc.) to continue to have a second potential supplier around or just bite the bullet and commit to TSMC being the only company that can build a frontier node?

And even if they can make it to the next node (1.4nm/14A), can they get to the one after that?

The implication for AI models is that they can end up like Intel (or AMD, selling off their fab) if they misstep badly enough on one or two nodes in a row. This was the real threat of Deepseek: if they could get frontier models for an order of magnitude cheaper, then the entire economics of this doesn't work. If they can't keep up, then the economics of it might, so long as people are willing to pay more for the value produced by the new models.

	▲	m101 2 days ago \| parent [-]
		Except it's like second tier semi manufacturer spending 10x less on the same fab in one years time. Here it might make sense to wait a bit. There will be customers, especially considering the diminishing returns these models seem to have come across. If performance was improving I'd agree with you, but it's not.

▲

majormajor 2 days ago | parent | prev | next [-]

Do they have a function to predict in advance if the next model is going to be profitable?

If not, this seems like a recipe for bankruptcy. You are always investing more than you're making, right up until the day you don't make it back. Whether that's next year or in ten or twenty years. It's basically impossible to do it forever - there simply isn't enough profit to be had in the world if you go forward enough orders of magnitude. How will they know when to hop off the train?

	▲	ikr678 2 days ago \| parent [-]
		Back in my day, we called this a pyramid scheme.

▲

Avshalom 2 days ago | parent | prev | next [-]

if you're referring to https://youtu.be/GcqQ1ebBqkc?t=1027 he doesn't actually say that each model has been profitable.

He says "You paid $100 million and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume in this cartoonish cartoon example that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model is actually, in this example is actually profitable. What's going on is that at the same time"

notice those are hypothetical numbers and he just asks you to assume that inference is (sufficiently) profitable.

He doesn't actually say they made money by the EoL of some model.

▲

9cb14c1ec0 2 days ago | parent | prev | next [-]

That can only be true if someone else is subsidizing Anthropic's compute. The calculation is simple: Annualized depreciation costs on the AI buildout (hundreds of billions, possibly a trillion invested) are more that the combined total annualized revenue of the inference industry. A more realistic computation of expenses would show the each model line very deeply in the red.

▲

oblio 2 days ago | parent | prev | next [-]

> According to Dario, each model line has generally been profitable: i.e. $200MM to train a model that makes $1B in profit over its lifetime.

Surely the Anthropic CEO will have no incentive to lie.

	▲	nielsbot 2 days ago \| parent [-]
		Not saying he's above lying, but I do believe there are potential legal ramifications from a CEO lying. (Assuming they get caught)

▲

viscanti 3 days ago | parent | prev | next [-]

Well how much of it is correlation vs causation. Does the next generation of model unlock another 10x usage? Or was Claude 3 "good enough" that it got traction from early adopters and Claude 4 is "good enough" that it's getting a lot of mid/late adopters using it for this generation? Presumably competitors get better and at cheaper prices (Anthropic charges a premium per token currently) as well.

▲

yahoozoo 2 days ago | parent | prev [-]

What about inference costs?

▲

mikestorrent 3 days ago | parent | prev | next [-]

Inference performance per watt is continuing to improve, so even if we hit the peak of what LLM technology can scale to, we'll see tokens per second, per dollar, and per watt continue to improve for a long time yet.

I don't think we're hitting peak of what LLMs can do, at all, yet. Raw performance for one-shot responses, maybe; but there's a ton of room to improve "frameworks of thought", which are what agents and other LLM based workflows are best conceptualized as.

The real question in my mind is whether we will continue to see really good open-source model releases for people to run on their own hardware, or if the companies will become increasingly proprietary as their revenue becomes more clearly tied up in selling inference as a service vs. raising massive amounts of money to pursue AGI.

	▲	ethbr1 2 days ago \| parent [-]
		My guess would be that it parallels other backend software revolutions. Initially, first party proprietary solutions are in front. Then, as the second-party ecosystem matures, they build on highest-performance proprietary solutions. Then, as second parties monetize, they begin switching to OSS/commodity solutions to lower COGS. And with wider use, these begin to outcompete proprietary solutions on ergonomics and stability (even if not absolute performance). While Anthropic and OpenAi are incinerating money, why not build on their platforms? As soon as they stop, scales tilt towards an apache/nginx type commoditized backend.

▲

duxup 3 days ago | parent | prev | next [-]

>cash furnace

They don't even burn it on on AI all the time either: https://openai.com/sam-and-jony/

▲

dmbche 3 days ago | parent | next [-]

"May 21, 2025

This is an extraordinary moment.

Computers are now seeing, thinking and understanding.

Despite this unprecedented capability, our experience remains shaped by traditional products and interfaces."

I don't even want to learn about them every line is so exhausting

	▲	duxup 3 days ago \| parent [-]
		Agreed, that whole page is brutal to read.

▲

serf 2 days ago | parent | prev [-]

I was expecting a wedding or birth announcement from that picture framing and title.

"We would like to introduce you to the spawn of Johnny Ive and Sam Altman, we're naming him Damien Thorn."

▲

jayde2767 3 days ago | parent | prev | next [-]

"cash furnace", so aptly put.

	▲	nielsbot 2 days ago \| parent \| next [-]
		And don't forget the furnace furnace: gas/coal to power all this.
	▲	gizajob 2 days ago \| parent \| prev [-]
		The economics will work out when the district heating is run off the local AI/cash furnace.

▲

general1465 3 days ago | parent | prev | next [-]

Yep we do. There is a 1 year old video on YouTube, which describes this limitation https://www.youtube.com/watch?v=5eqRuVp65eY

Called efficient compute frontier

▲

fredoliveira 3 days ago | parent | prev [-]

I think that the performance unlock from ramping up RL (RLVR specifically) is not fully priced into the current generation yet. Could be wrong, and people closer to the metal will know better, but people I talk to still feel optimistic about the next couple of years.

▲

derefr 3 days ago | parent | prev | next [-]

> Anecdotally moving from model to model I'm not seeing huge changes in many use cases.

Probably because you're doing things that are hitting mostly the "well-established" behaviors of these models — the ones that have been stable for at least a full model-generation now, that the AI bigcorps are currently happy keeping stable (since they achieved 100% on some previous benchmark for those behaviors, and changing them now would be a regression per those benchmarks.)

Meanwhile, the AI bigcorps are focusing on extending these models' capabilities at the edge/frontier, to get them to do things they can't currently do. (Mostly this is inside-baseball stuff to "make the model better as a tool for enhancing the model": ever-better domain-specific analysis capabilities, to "logic out" whether training data belongs in the training corpus for some fine-tune; and domain-specific synthesis capabilities, to procedurally generate unbounded amounts of useful fine-tuning corpus for specific tasks, ala AlphaZero playing unbounded amounts of Go games against itself to learn on.)

This means that the models are getting constantly bigger. And this is unsustainable. So, obviously, the goal here is to go through this as a transitionary bootstrap phase, to reach some goal that allows the size of the models to be reduced.

IMHO these models will mostly stay stable-looking for their established consumer-facing use-cases, while slowly expanding TAM "in the background" into new domain-specific use-cases (e.g. constructing novel math proofs in iterative cooperation with a prover) — until eventually, the sum of those added domain-specific capabilities will turn out to have all along doubled as a toolkit these companies were slowly building to "use models to analyze models" — allowing the AI bigcorps to apply models to the task of optimizing models down to something that run with positive-margin OpEx on whatever hardware that would be available at that time 5+ years down the line.

And then we'll see them turn to genuinely improving the model behavior for consumer use-cases again; because only at that point will they genuinely be making money by scaling consumer usage — rather than treating consumer usage purely as a marketing loss-leader paid for by the professional usage + ongoing capital investment that that consumer usage inspires.

▲

Workaccount2 3 days ago | parent | next [-]

>Mostly this is inside-baseball stuff to "make the model better as a tool for enhancing the model"

Last week I put GPT-5 and Gemini 2.5 in a conversation with each other about a topic of GPT-5's choosing. What did it pick?

Improving LLMs.

The conversation was far over my head, but the two seemed to be readily able to get deep into the weeds on it.

I took it as a pretty strong signal that they have an extensive training set of transformer/LLM tech.

	▲	temp0826 2 days ago \| parent [-]
		Like trying to have a lunch conversation with coworkers about anything other than work

▲

StephenHerlihyy 2 days ago | parent | prev | next [-]

My understanding is that model are already merely a confederation of many smaller sub-models being used as "tools" to derive answers. I am surprised that it took us this long to solve the "AI + Microservices = GOLD!" equation.

▲

kdmtctl 3 days ago | parent | prev [-]

You have just described a singularity point for this line of business. Which could happen. Or not.

	▲	derefr 3 days ago \| parent [-]
		I wouldn't describe it as a singularity point. I don't mean that they'll get models to design better model architectures, or come up with feature improvements for the inference/training host frameworks, etc. Instead, I mean that these later-generation models will be able to be fine-tuned to do things like e.g. recognizing and discretizing "feature circuits" out of the larger model NN into algorithms, such that humans can then simplify these algorithms (representing the fuzzy / incomplete understanding a model learned of a regular digital-logic algorithm) into regular code; expose this code as primitives/intrinsics the inference kernel has access to (e.g. by having output vectors where every odd position represents a primitive operation to be applied before the next attention pass, and every even position represents a parameter for the preceding operation to take); cut out the original circuits recognized by the discretization model, substituting simple layer passthrough with calls to these operations; continue training from there, to collect new, higher-level circuits that use these operations; extract + burn in + reference those; and so on; and then, after some amount of this, go back and re-train the model from the beginning with all these gained operations already being available from the start, "for effect." Note that human ingenuity is still required at several places in this loop; you can't make a model do this kind of recursive accelerator derivation to itself without any cross-checking, and still expect to get a good result out the other end. (You could, if you could take the accumulated intuition and experience of an ISA designer that guides them to pick the set of CISC instructions to actually increase FLOPS-per-watt rather than just "pushing food around on the plate" — but long explanations or arguments about ISA design, aren't the type of thing that makes it onto the public Internet; and even if they did, there just aren't enough ISAs that have ever been designed for a brute-force learner like an LLM to actually learn any lessons from such discussions. You'd need a type of agent that can make good inferences from far less training data — which is, for now, a human.)

▲

ACCount37 3 days ago | parent | prev | next [-]

The raw model scale is not increasing by much lately. AI companies are constrained by what fits in this generation of hardware, and waiting for the next generation to become available. Models that are much larger than the current frontier are still too expensive to train, and far too expensive to serve them en masse.

In the meanwhile, "better data", "better training methods" and "more training compute" are the main ways you can squeeze out more performance juice without increasing the scale. And there are obvious gains to be had there.

▲

robwwilliams 3 days ago | parent | next [-]

The jump to 1 million token length context for Sonnet 4 plus access to internet has been a game-changer for me. And somebody should remind Anthropic leadership to at least mirror Wikipedia; better yet support Wikipedia actively.

All of the big AI players have profited from Wikipedia, but have they given anything back, or are they just parasites on FOSS and free data?

▲

xnx 3 days ago | parent | prev [-]

> AI companies are constrained by what fits in this generation of hardware, and waiting for the next generation to become available.

Does this apply to Google that is using custom built TPUs while everyone else uses stock Nvidia?

▲

ACCount37 3 days ago | parent [-]

By all accounts, what's in Google's racks right now (TPU v5e, v6e) is vaguely H100-adjacent, in both raw performance and supported model size.

If Google wants anything better than that? They, too, have to wait for the new hardware to arrive. Chips have a lead time - they may be your own designs, but you can't just wish them into existence.

	▲	xxpor 3 days ago \| parent [-]
		Aren't chips + memory constrained by process + reticle size? And therefore, how much HBM you can stuff around the compute chip? I'd expect everyone to more or less support the same model size at the same time because of this, without a very fundamentally different architecture.

▲

gmadsen 3 days ago | parent | prev | next [-]

Its not clear to me that it needs to. If at the margins it can still provide an advantage in the market or national defense, then the spice must flow

	▲	duxup 3 days ago \| parent [-]
		I suspect it needs to if it is going to cover the costs of training.

▲

yieldcrv 3 days ago | parent | prev | next [-]

Locally run video models that are just as good as today’s closed models are going to be the watershed moment

The companies doing foundational video models have stakeholders that don’t want to be associated with what people really want to generate

But they are pushing the space forward and the uncensored and unrestricted video model is coming

▲

lynx97 3 days ago | parent | next [-]

Maybe. The question is, will legislation be fast enough? Maybe, if people keep going for politician porn: https://www.theguardian.com/world/2025/aug/28/outrage-in-ita...

▲

kaashif 3 days ago | parent [-]

Well considering it has been possible to produce similar doctored images for decades at this point, I think we can conclude legislation has not been fast enough.

That article is nothing to do with AI, really.

	▲	yieldcrv 2 days ago \| parent [-]
		and people focus way too much much on superimposed images instead of completely new digital avatars, which is what’s already taking off now

▲

giancarlostoro 3 days ago | parent | prev | next [-]

Nobody wants to make a commercial NSFW model that then suffers a jailbreak... for what is the most illegal NSFW content.

▲

yieldcrv 3 days ago | parent | next [-]

Thats the thing, what’s “illegal” will challenge our whole society when it comes do dynamically generated real interactive avatars that are new humans

When it comes to sexually explicit content in general with adults, all of our laws rely on the human actor existing

FOSTA and SESTA is related to user generated content of humans, for example. They rely on making sure an actual human isnt being exploited and burdening everyone with that enforcement. When everyone can just say “thats AI” nobody’s going to care and platforms will be willing to take that risk of it being true again - or a new hit platform will. That kind of content currently Doesnt exist in large quantities yet, until a video model ungimped can generate it.

Concerns about trafficking only rely on actual humans not entirely new avatars

regarding children there are more restrictions that may already cover this, there is a large market for just adult looking characters though and worries about underage can be tackled independently. or be found entirely futile. not my problem, focus on what you can control. this is whats coming though.

people already dont mind parasocial relationships with generative AI and already pay for that, just add nudity

▲

tick_tock_tick 2 days ago | parent | prev | next [-]

It's going to be really weird when huge swaths of the internet are illegal to visit outside the USA because you keep running into that kind of AI generate "content".

▲

simianwords 2 days ago | parent | prev [-]

Why is this illegal btw? I mean whats stopping an AI company from releasing a proper NSFW model? I hope it doesn't happen but I want to know what prevents them from doing it now.

▲

baq 2 days ago | parent [-]

in some jurisdictions generating a swastika or a hammer and sickle is illegal.

that said, I'm sure you can imagine that the really illegal, truly, positively sickening and immoral stuff is children-adjacent and you can be 100% sure there are sociopaths doing training runs for the broken people who'll buy the weights.

▲

simianwords 2 days ago | parent [-]

Is it illegal to use mspaint to generate similar vile things?

	▲	Majromax 2 days ago \| parent \| next [-]
		Not in the United States, but it is illegal in some jurisdictions. Additionally, the entire "payment processors leaning on Steam" thing shows that it might be very difficult to monetize a model that's known for generating extremely controversial content. Without monetization, it would be hard for any company to support the training (and potential release) of an unshackled enterprise-grade model.
	▲	tick_tock_tick 2 days ago \| parent \| prev [-]
		Most of Europe doesn't really have free speech, frankly most of the world doesn't. Privileges like making mspaint drawings of nearly whatever you want is pretty uniquely American.

▲

xenobeb 2 days ago | parent | prev [-]

The problem is the video models are only impressive in news stories about the video models. When you actually try to use them you can see how the marketing is playing to people's imagination because they are such a massive disappointment.

	▲	xnx 2 days ago \| parent [-]
		Not my experience. Have you used Veo 3?

▲

wslh 3 days ago | parent | prev | next [-]

> Anecdotally moving from model to model I'm not seeing huge changes in many use cases. I can just pick an older model and often I can't tell the difference...

Model specialization. For example a model with legal knowledge based on [private] sources not used until now.

▲

dvfjsdhgfv 3 days ago | parent | prev | next [-]

> I can just pick an older model and often I can't tell the difference...

Or, as in the case of a leading North American LLM provider, I would love to be able to choose an older model but it chooses it for me instead.

▲

darepublic 3 days ago | parent | prev | next [-]

I hope you're right.

▲

ljlolel 3 days ago | parent | prev [-]

The scaling laws already predict diminishing in returns