The compute moat is getting absolutely insane. We're basically at the point where you need a small country's GDP just to stay in the game for one more generation of models.

What gets me is that this isn't even a software moat anymore - it's literally just whoever can get their hands on enough GPUs and power infrastructure. TSMC and the power companies are the real kingmakers here. You can have all the talent in the world but if you can't get 100k H100s and a dedicated power plant, you're out.

Wonder how much of this $13B is just prepaying for compute vs actual opex. If it's mostly compute, we're watching something weird happen - like the privatization of Manhattan Project-scale infrastructure. Except instead of enriching uranium we're computing gradient descents lol

The wildest part is we might look back at this as cheap. GPT-4 training was what, $100M? GPT-5/Opus-4 class probably $1B+? At this rate GPT-7 will need its own sovereign wealth fund

▲

AlexandrB 2 days ago | parent | next [-]

The whole LLM era is horrible. All the innovation is coming "top-down" from very well funded companies - many of them tech incumbents, so you know the monetization is going to be awful. Since the models are expensive to run it's all subscription priced and has to run in the cloud where the user has no control. The hype is insane, and so usage is being pushed by C-suite folks who have no idea whether it's actually benefiting someone "on the ground" and decisions around which AI to use are often being made on the basis of existing vendor relationships. Basically it's the culmination of all the worst tech trends of the last 10 years.

▲

dpe82 2 days ago | parent | next [-]

In a previous generation, the enabler of all our computer tech innovation was the incredible pace of compute growth due to Moore's Law, which was also "top-down" from very well-funded companies since designing and building cutting edge chips was (and still is) very, very expensive. The hype was insane, and decisions about what chip features to build were made largely on the basis of existing vendor relationships. Those companies benefited, but so did the rest of us. History rhymes.

▲

JohnMakin 2 days ago | parent | next [-]

Should probably change this to "was appearance of incredible pace of compute growth due to Moore's Law," because even my basic CS classes from 15 years ago were teaching that it was drastically slowing down, and isn't really a "law" more than an observational trend that lasted a few decades. There are limits to how small you can make transistors and we're not too far from it, at least not what would continue to yield the results of that curve.

	▲	noosphr 2 days ago \| parent [-]
		The corollary to Moores law, that computers get twice as fast every 18 months, died by 2010. People who didn't live through the 80s, 90s and early 00s, where you'd get a computer ten times as fast every 5 years, can't imagine what it was like back then. Today the only way to scale compute is to throw more power at it or settle for the 5% per year real single core performance improvement.

▲

BrenBarn 2 days ago | parent | prev | next [-]

The difference is once you bought one of those chips you could do your own innovation on top of it (i.e., with software) without further interference from those well-funded companies. You can't do that with GPT et al. because of the subscription model.

▲

almogo 2 days ago | parent [-]

Yes you can? Sure you can't run GPT5 locally, but get your hands on a proper GPU and you can run some still very sophisticated local inference.

	▲	BrenBarn a day ago \| parent [-]
		You can do some, but many of them have license restrictions that prevent you from using them in certain ways. I can buy an Intel chip and deliberately use it to do things that hurt Intel's business (e.g., start a competing company). The big AI companies are trying very hard to make that kind of thing impossible by imposing constraints on the allowed uses of their models.

▲

dmschulman 2 days ago | parent | prev | next [-]

Eh, if this is true then IBM and Intel would still be the kings of the hill. Plenty of companies came from the bottom up out of nothing during the 90s and 2000s to build multi-billion dollar companies that are still dominate the market today. Many of those companies struggled for investment and grew over a long timeframe.

The argument is something like that is not really possible anymore given the absurd upfront investments we're seeing existing AI companies need in order to further their offerings.

▲

dpe82 2 days ago | parent | next [-]

Anthropic has existed for a grand total of 4 years.

But yes, there was a window of opportunity when it was possible to do cutting-edge work without billions of investment. That window of opportunity is now past, at least for LLMs. Many new technologies follow a similar pattern.

	▲	falcor84 2 days ago \| parent [-]
		What about deepseek r1? That was earlier this year - how do you know that there won't be more "deepseek moments" in the coming years?

▲

3uler 2 days ago | parent | prev [-]

Intel was king of the hill until 2018.

	▲	BobbyTables2 2 days ago \| parent [-]
		“Bobby, some things are like a tire fire: trying to put it out only makes it worse. You just gotta grab a beer and let it burn.” – Hank Rutherford Hill

▲

HellDunkel 2 days ago | parent | prev [-]

You completly forgot about the invention of the home computer. If we would have all been loging into some mainframe computer using a home terminal your assessment would be correct.

▲

simianwords 2 days ago | parent | prev | next [-]

This is very pessimistic take. Where else do you think the innovation would come from? Take cloud for example - where did the innovation come from? It was from the top. I have no idea how you came to the conclusion that this implies monetization is going to be awful.

How do you know models are expensive to run? They have gone down in price repeatedly in the last 2 years. Why do you assume it has to run in the cloud when open source models can perform well?

> The hype is insane, and so usage is being pushed by C-suite folks who have no idea whether it's actually benefiting someone "on the ground" and decisions around which AI to use are often being made on the basis of existing vendor relationships

There are hundreds of millions of chatgpt users weekly. They didn't need a C suite to push the usage.

▲

AlexandrB 2 days ago | parent | next [-]

> I have no idea how you came to the conclusion that this implies monetization is going to be awful.

Because cloud monetization was awful. It's either endless subscription pricing or ads (or both). Cloud is a terrible counter-example because it started many awful trends that strip consumer rights. For example "forever" plans that get yoinked when the vendor decides they don't like their old business model and want to charge more.

	▲	simianwords 2 days ago \| parent \| next [-]
		Vast majority of cloud users use AWS, GCP and Azure which have metered billing. I'm not sure what you are talking about.
	▲	throwaway98797 2 days ago \| parent \| prev \| next [-]
		lots of start ups were built on aws i’d rather have a subscription than no service at all oh, and one can always just not buy something if it’s not valuable enough
	▲	Daz1 2 days ago \| parent \| prev [-]
		>Because cloud monetization was awful Citation needed

▲

acdha 2 days ago | parent | prev | next [-]

> Take cloud for example - where did the innovation come from? It was from the top.

Definitely not. That came years later but in the late 2000s to mid-2010s it was often engineers pushing for cloud services over the executives’ preferred in-house services because it turned a bunch of helpdesk tickets and weeks to months of delays into an AWS API call. Pretty soon CTOs were backing it because those teams shipped faster.

The consultants picked it up, yes, but they push a lot of things and usually it’s only the ones which actual users want which succeed.

▲

HotHotLava 2 days ago | parent | next [-]

I'm pretty sure OP wasn't talking about the management hierarchy, but "from the top" in the sense that it was big established companies inventing the cloud and innovating and pushing in the space, not small startups.

	▲	acdha 2 days ago \| parent \| next [-]
		That could be, I was definitely thinking of management hierarchy since that difference has been so striking with AI. A lot of my awareness started in the academic HPC world which was a bit ahead in needing high capacity of generic resources but it felt like this came from the edges rather than the major IT giants. Companies like IBM, Microsoft, or HP weren’t doing it, and some companies like Oracle or Cisco appeared to thought that infrastructure complexity was part of their lock on enterprise IT departments since places with complex hand run books weren’t quick to switch vendors. Amazon at the time wasn’t seen as a big tech company - they were where you bought CDs – and companies like Joyent or Rackspace had a lot of mindshare as well before AWS started offering virtual compute in 2006. One big factor in all of this was that x86 virtualization wasn’t cheap until the mid-to-late 2000s so a lot of people weren’t willing to pay high virtualization costs, but without that you’re talking services like Bingodisk or S3 rather than companies migrating compute loads.
	▲	pandemicsyn 2 days ago \| parent \| prev [-]
		Sure Amazon was a big established co at the dawn of the cloud, and a little bit of an unexpected dark horse. None of the managed hosting providers saw Amazon coming. Also ran's like Rackspace and the like where also pretty established by that point. But there was also cool stuff happening at smaller places like Joyent, Heroku, Slicehost, Linode, Backblaze, iron.io, etc.

▲

simianwords 2 days ago | parent | prev [-]

Sure that’s the same way GPT was invented in Google.

▲

HarHarVeryFunny 2 days ago | parent | prev | next [-]

C-suite is pushing business adoption, and those GenAI projects of which 95% are failing.

▲

simianwords 2 days ago | parent | next [-]

The other side of it is lots of users are willingly purchasing the subscription without any need of push.

	▲	HarHarVeryFunny 2 days ago \| parent \| next [-]
		Sure - there are use cases for LLMs that work, and use cases that don't. I think those actually using "AI" have a lot better idea of which are which than the C-suite folk.
	▲	ath3nd 2 days ago \| parent \| prev [-]
		And yet we fail to see an uptick of better and higher quality software, if anything, AI slop is making OSS owners reject AI prs because of their low quality. I'd wager the personal failure rate when using LLMs is probably even higher than the 95% in enterprise, but will wait to see the numbers.

▲

og_kalu 2 days ago | parent | prev [-]

That same report said a lot people are just using personal accounts for work though.

▲

BobbyTables2 2 days ago | parent | prev [-]

Cloud is just “rent to own” without the “own” part.

▲

awongh 2 days ago | parent | prev | next [-]

> All the innovation is coming "top-down" from very well funded companies - many of them tech incumbents

What I always thought was exceptional is that it turns out it wasn't the incumbents who have the obvious advantage.

Take away the fact that everyone involved is already at the top 0.00001% echelon of the space (Sam Altman and everyone involved with the creation of OpenAI), but if you had asked me 10 years ago who will have the leg up creating advanced AI I would have said all the big companies hoarding data.

Turns out just having that data wasn't a starting requirement for the generation of models we have now.

A lot of the top players in the space are not the giant companies with unlimited resources.

Of course this isn't the web or web 2.0 era where to start something huge the starting capital was comparatively tiny, but it's interesting to see that the space allows for brand new companies to come out and be competitive against Google and Meta.

▲

crawshaw 2 days ago | parent | prev | next [-]

> All the innovation is coming "top-down" from very well funded companies - many of them tech incumbents

The model leaders here are OpenAI and Anthropic, two new companies. In the programming space, the next leaders are Qwen and DeepSeek. The one incumbent is Google who trails all four for my workloads.

In the DevTools space, a new startup, Cursor, has muscled in on Microsoft's space.

This is all capital heavy, yes, because models are capital heavy to build. But the Innovator's Dilemma persists. Startups lead the way.

▲

lexandstuff 2 days ago | parent | next [-]

And all of those companies except for Google are entirely dependant on NVIDIA who are the real winners here.

▲

nightski 2 days ago | parent | prev [-]

At what point is OpenAI not considered new? It's a few months from being a decade old with 3,000 employees and $60B in funding.

	▲	fshr 2 days ago \| parent [-]
		Well, compare them to Microsoft: 50 years old with 228,000 employees and $282 billion in revenue.

▲

tedivm 2 days ago | parent | prev | next [-]

This is only if you ignore the growing open source models. I'm running Qwen3-30B at home and it works great for most of the use cases I have. I think we're going to find that the optimizations coming from companies out of China are going to continue making local LLMs easier for folks to run.

▲

DSingularity 2 days ago | parent [-]

What hardware do you use ?

	▲	tedivm 17 hours ago \| parent [-]
		I have two 3090s, but I'm only using one for this model.

▲

hintymad 2 days ago | parent | prev | next [-]

> The whole LLM era is horrible. All the innovation is coming "top-down" from very well funded companies

Wouldn't it be the same for the hardware companies? Not everyone could build CPUs as Intel/Motorola/IBM did, not everyone could build mainframes like IBM did, and not everyone could build smart phones like Apple or Samsung did. I'd assume it boils down the value of the LLMs instead of who has the moat. Of course, personally I really wish everyone can participate in the innovation like the internet era, like training and serving large models on a laptop. I guess that day will come, like PC over mainframes, but just not now.

▲

mlyle 2 days ago | parent | prev | next [-]

They've gotta hope they get to cheap AGI, though.

Any stall in progress either on chips or smartness/FLOP means there's a lot of surplus previous generation gear that can hang and commoditize it all out to open models.

Just like how the "dot com bust" brought about an ISP renaissance on all the surplus, cheap-but-slightly-off-leading-edge gear.

IMO that's the opportunity for a vibrant AI ecosystem.

Of course, if they get to cheap AGI, we're cooked: both from vendors having so much control and the destabilization that will come to labor markets, etc.

▲

atleastoptimal 2 days ago | parent | prev | next [-]

Nevertheless, prices for LLM at any given level of performance have gone down precipitously over the past few years. Regardless of how bad it seems the decisions being made are, the decision making process both is making an extreme amount of money for those in the AI companies, and providing extremely cheap and high quality intelligence for those using their offerings.

▲

pimlottc 2 days ago | parent [-]

Remember when you could get an Uber ride all the way across town for $5? It is way too early to know what prices for these services will actually cost.

▲

atleastoptimal 2 days ago | parent [-]

Is there an open source Uber? There are multiple open source AI models far beyond what SOTA was just 1 year ago. Even if they don't manage to drive prices down on the most recent closed models, they themselves will never be a trivial amount more than the compute they run on, and compute will only get more expensive if demand for AI continues to grow exponentially, which would likewise drive prices down due to competitive pressure.

	▲	xigoi 19 hours ago \| parent [-]
		> There are multiple open source AI models far beyond what SOTA was just 1 year ago. There are many models that call themselves open source, but the source is nowhere to be found, only the weights.

▲

chermi 2 days ago | parent | prev | next [-]

What's the counterfactual? Where would the world be today? Certainly the present is not an optimal allocation of resources, uncertainty and hysteris make it impossible. But where do you think we'd be instead? Are you assuming all of those dollars would be going to research otherwise? They wouldn't; if not for hype "ai" LLMs, research funding would be at 2017+/- 25% levels. Also think of how many researchers are funded and PhDs are trained because of this awful LLM era. Certainly their skills transfer. (Not that brute forcing with shit tons of compute is standard "research funding").

And for the record I really wish more money was being thrown outside of LLM.

▲

edg5000 2 days ago | parent | prev | next [-]

How can you dismiss the value of the tech so blatantly? Have you used Opus for general questions and coding?

> no idea whether it's actually benefiting someone "on the ground"

I really don't get it. Before, we were farmers plowing by hand, and now we are using tractors.

I do totally agree with your sentiment that it's still a horrible development though! Before Claude Code, I ran everything offline, all FOSS, owned all my machines, servers etc. Now I'm a subscription user. Zero control, zero privacy. That is the downside of it all.

Actually, it's just like the mechanisation of farming! Collectivization in some countries was a nightmare for small land owners who cultivated the land (probably with animals). They went from that to a more efficient, government controlled collective farm, where they were just a farm worker, with the land reclaimed through land reform. That was an upgrade for the efficiency of farming, needing fewer humans for it. But a huge downgrade for the individual small-scale land owners.

▲

conartist6 2 days ago | parent | prev | next [-]

Come to the counterrevolution; we have cookies : )

▲

3uler 2 days ago | parent | prev [-]

[flagged]

▲

monax 2 days ago | parent [-]

If you get a 10x speedup with an LLM it mean you are not doing anything new or interesting

	▲	3uler 2 days ago \| parent [-]
		That is 99% of software engineering, boring line of business CRUD applications or data pipelines. Most creativity is just doing some slightly different riff on something done before… Sorry to break it to you but most of your job is just context engineering for yourself.

▲

duxup 3 days ago | parent | prev | next [-]

It's not clear to me that each new generation of models is going to be "that" much better vs cost.

Anecdotally moving from model to model I'm not seeing huge changes in many use cases. I can just pick an older model and often I can't tell the difference...

Video seems to be moving forward fast from what I can tell, but it sounds like the back end cost of compute there is skyrocketing with it raising other questions.

▲

renegade-otter 3 days ago | parent | next [-]

We do seem to be hitting the top of the curve of diminishing returns. Forget AGI - they need a performance breakthrough in order to stop shoveling money into this cash furnace.

▲

reissbaker 3 days ago | parent | next [-]

According to Dario, each model line has generally been profitable: i.e. $200MM to train a model that makes $1B in profit over its lifetime. But, since each model has been more and more expensive to train, they keep needing to raise more money to train the next generation of model, and the company balance sheet looks negative: i.e. they spent more this year than last (since the training cost for model N+1 is higher), and the model this year made less money this year than they spent (even if the model generation itself was profitable, model N isn't profitable enough to train model N+1 without raising — and spending — more money).

That's still a pretty good deal for an investor: if I give you $15B, you will probably make a lot more than $15B with it. But it does raise questions about when it will simply become infeasible to train the subsequent model generation due to the costs going up so much (even if, in all likelihood, that model would eventually turn a profit).

▲

dom96 3 days ago | parent | next [-]

> if I give you $15B, you will probably make a lot more than $15B with it

"probably" is the key word here, this feels like a ponzi scheme to me. What happens when the next model isn't a big enough jump over the last one to repay the investment?

It seems like this already happened with GPT-5. They've hit a wall, so how can they be confident enough to invest ever more money into this?

▲

bcrosby95 2 days ago | parent [-]

I think you're really bending over backwards to make this company seem non viable.

If model training has truly turned out to be profitable at the end of each cycle, then this company is going to make money hand over fist, and investing money to out compete the competition is the right thing to do.

Most mega corps started out wildly unprofitable due to investing into the core business... until they aren't. It's almost as if people forget the days of Facebook being seen as continually unprofitable. This is how basically all huge tech companies you know today started.

▲

serf 2 days ago | parent | next [-]

>I think you're really bending over backwards to make this company seem non viable.

Having experienced Anthropic as a customer, I have a hard time thinking that their inevitable failure (something i'd bet on) will be model/capability-based, that's how bad they suck at every other customer-facing metric.

You think Amazon is frustrating to deal with? Get into a CSR-chat-loop with an uncaring LLM followed up on by an uncaring CSR.

My minimum response time with their customer service is 14 days -- 2 weeks -- while paying 200usd a month.

An LLM could be 'The Great Kreskin' and I would still try to avoid paying for that level of abuse.

▲

sbarre 2 days ago | parent | next [-]

Maybe you don't want to share, but I'm scratching my head trying to think of something I would need to talk to Anthropic's customer service about that would be urgent and un-straightfoward enough to frustrate me to the point of using the term "abuse"..

▲

babelfish 2 days ago | parent [-]

Particularly since they seem to be complaining about service as a consumer, rather than an enterprise...

	▲	2 days ago \| parent [-]
		[deleted]

▲

StephenHerlihyy 2 days ago | parent | prev [-]

What's fun is that I have had Anthropic's AI support give me blatantly false information. It tried to tell me that I could get a full year's worth of Claude Max for only $200 dollars. When I asked if that was true it quickly backtracked and acknowledged it's mistake. I figure someone more litigious will eventually try to capitalize.

	▲	nielsbot 2 days ago \| parent [-]
		"Air Canada must honor refund policy invented by airline’s chatbot" https://arstechnica.com/tech-policy/2024/02/air-canada-must-...

▲

ricardobayes 2 days ago | parent | prev | next [-]

It's an interesting case. IMO LLMs are not a product in the classical sense, companies like Anthropic are basically doing "basic research" so others can build products on top of it. Perhaps Anthropic will charge a royalty on the API usage. I personally don't think you can earn billions selling $500 subscriptions. This has been shown by the SaaS industry. But it is yet to be seen whether the wider industry will accept such royalty model. It would be akin to Kodak charging filmmakers based on the success of the movie. Somehow AI companies will need to build a monetization pipeline that will earn them a small amount of money "with every gulp", if we are using a soft drink analogy.

▲

Barbing 2 days ago | parent | prev [-]

Thoughts on Ed Zitron’s pessimism?

“There Is No AI Revolution” - Feb ‘25:

https://www.wheresyoured.at/wheres-the-money/

▲

reissbaker 21 hours ago | parent [-]

Ed Zitron plainly has no idea what he's talking about. For example:

Putting aside the hype and bluster, OpenAI — as with all generative AI model developers — loses money on every single prompt and output. Its products do not scale like traditional software, in that the more users it gets, the more expensive its services are to run because its models are so compute-intensive.

While OpenAI's numbers aren't public, this seems very unlikely. Given open-source models can be profitably run for cents per million input tokens at FP8 — and OpenAI is already training (and thus certainly running) in FP4 — even if the closed-source models are many times bigger than the largest open-source models, OpenAI is still making money hand over fist on inference. The GPT-5 API costs $1.25/million input tokens: that's a lot more than it takes in compute to run it. And unless you're using the API, it's incredibly unlikely you're burning through millions of tokens in a week... And yet, subscribers to the chat UI are paying $20/month (at minimum!), which is much higher than a few million tokens a week cost.

Ed Zitron repeats his claim many, many, excruciatingly many times throughout the article, and it seems quite central to the point he's trying to make. But he's wrong, and wrong enough that I think you should doubt that he knows much about what he's talking about.

(His entire blog seems to be a series of anti-tech screeds, so in general I'm pretty dubious he has deep insight into much of anything in the industry. But he quite obviously doesn't know about the economics of LLM inference.)

	▲	Barbing 14 hours ago \| parent [-]
		Thank you for your analysis!

▲

mandevil 3 days ago | parent | prev | next [-]

I mean, this is how semiconductors have worked forever. Every new generation of fab costs ~2x what the previous generation did, and you need to build a new fab ever couple of years. But (if you could keep the order book full for the fab) it would make a lot of money over its lifetime, and you still needed to borrow/raise even more to build the next generation of fab. And if you were wrong about demand .... you got into a really big bust, which is also characteristic of the semiconductor industry.

This was the power of Moore's Law, it gave the semiconductor engineers an argument they could use to convince the money-guys to let them raise the capital to build the next fab- see, it's right here in this chart, it says that if we don't do it our competitors will, because this chart shows that it is inevitable. Moore's Law had more of a financial impact than a technological one.

And now we're down to a point where only TSMC is for sure going through with the next fab (as a rough estimate of cost, think 40 billion dollars)- Samsung and Intel are both hemming and hawing and trying to get others to go in with them, because that is an awful lot of money to get the next frontier node. Is Apple (and Nvidia, AMZ, Google, etc.) willing to pay the costs (in delivery delays, higher costs, etc.) to continue to have a second potential supplier around or just bite the bullet and commit to TSMC being the only company that can build a frontier node?

And even if they can make it to the next node (1.4nm/14A), can they get to the one after that?

The implication for AI models is that they can end up like Intel (or AMD, selling off their fab) if they misstep badly enough on one or two nodes in a row. This was the real threat of Deepseek: if they could get frontier models for an order of magnitude cheaper, then the entire economics of this doesn't work. If they can't keep up, then the economics of it might, so long as people are willing to pay more for the value produced by the new models.

	▲	m101 2 days ago \| parent [-]
		Except it's like second tier semi manufacturer spending 10x less on the same fab in one years time. Here it might make sense to wait a bit. There will be customers, especially considering the diminishing returns these models seem to have come across. If performance was improving I'd agree with you, but it's not.

▲

majormajor 2 days ago | parent | prev | next [-]

Do they have a function to predict in advance if the next model is going to be profitable?

If not, this seems like a recipe for bankruptcy. You are always investing more than you're making, right up until the day you don't make it back. Whether that's next year or in ten or twenty years. It's basically impossible to do it forever - there simply isn't enough profit to be had in the world if you go forward enough orders of magnitude. How will they know when to hop off the train?

	▲	ikr678 2 days ago \| parent [-]
		Back in my day, we called this a pyramid scheme.

▲

Avshalom 2 days ago | parent | prev | next [-]

if you're referring to https://youtu.be/GcqQ1ebBqkc?t=1027 he doesn't actually say that each model has been profitable.

He says "You paid $100 million and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume in this cartoonish cartoon example that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model is actually, in this example is actually profitable. What's going on is that at the same time"

notice those are hypothetical numbers and he just asks you to assume that inference is (sufficiently) profitable.

He doesn't actually say they made money by the EoL of some model.

▲

9cb14c1ec0 2 days ago | parent | prev | next [-]

That can only be true if someone else is subsidizing Anthropic's compute. The calculation is simple: Annualized depreciation costs on the AI buildout (hundreds of billions, possibly a trillion invested) are more that the combined total annualized revenue of the inference industry. A more realistic computation of expenses would show the each model line very deeply in the red.

▲

oblio 2 days ago | parent | prev | next [-]

> According to Dario, each model line has generally been profitable: i.e. $200MM to train a model that makes $1B in profit over its lifetime.

Surely the Anthropic CEO will have no incentive to lie.

	▲	nielsbot 2 days ago \| parent [-]
		Not saying he's above lying, but I do believe there are potential legal ramifications from a CEO lying. (Assuming they get caught)

▲

viscanti 3 days ago | parent | prev | next [-]

Well how much of it is correlation vs causation. Does the next generation of model unlock another 10x usage? Or was Claude 3 "good enough" that it got traction from early adopters and Claude 4 is "good enough" that it's getting a lot of mid/late adopters using it for this generation? Presumably competitors get better and at cheaper prices (Anthropic charges a premium per token currently) as well.

▲

yahoozoo 2 days ago | parent | prev [-]

What about inference costs?

▲

mikestorrent 3 days ago | parent | prev | next [-]

Inference performance per watt is continuing to improve, so even if we hit the peak of what LLM technology can scale to, we'll see tokens per second, per dollar, and per watt continue to improve for a long time yet.

I don't think we're hitting peak of what LLMs can do, at all, yet. Raw performance for one-shot responses, maybe; but there's a ton of room to improve "frameworks of thought", which are what agents and other LLM based workflows are best conceptualized as.

The real question in my mind is whether we will continue to see really good open-source model releases for people to run on their own hardware, or if the companies will become increasingly proprietary as their revenue becomes more clearly tied up in selling inference as a service vs. raising massive amounts of money to pursue AGI.

	▲	ethbr1 2 days ago \| parent [-]
		My guess would be that it parallels other backend software revolutions. Initially, first party proprietary solutions are in front. Then, as the second-party ecosystem matures, they build on highest-performance proprietary solutions. Then, as second parties monetize, they begin switching to OSS/commodity solutions to lower COGS. And with wider use, these begin to outcompete proprietary solutions on ergonomics and stability (even if not absolute performance). While Anthropic and OpenAi are incinerating money, why not build on their platforms? As soon as they stop, scales tilt towards an apache/nginx type commoditized backend.

▲

duxup 3 days ago | parent | prev | next [-]

>cash furnace

They don't even burn it on on AI all the time either: https://openai.com/sam-and-jony/

▲

dmbche 3 days ago | parent | next [-]

"May 21, 2025

This is an extraordinary moment.

Computers are now seeing, thinking and understanding.

Despite this unprecedented capability, our experience remains shaped by traditional products and interfaces."

I don't even want to learn about them every line is so exhausting

	▲	duxup 3 days ago \| parent [-]
		Agreed, that whole page is brutal to read.

▲

serf 2 days ago | parent | prev [-]

I was expecting a wedding or birth announcement from that picture framing and title.

"We would like to introduce you to the spawn of Johnny Ive and Sam Altman, we're naming him Damien Thorn."

▲

jayde2767 3 days ago | parent | prev | next [-]

"cash furnace", so aptly put.

	▲	nielsbot 2 days ago \| parent \| next [-]
		And don't forget the furnace furnace: gas/coal to power all this.
	▲	gizajob 2 days ago \| parent \| prev [-]
		The economics will work out when the district heating is run off the local AI/cash furnace.

▲

general1465 3 days ago | parent | prev | next [-]

Yep we do. There is a 1 year old video on YouTube, which describes this limitation https://www.youtube.com/watch?v=5eqRuVp65eY

Called efficient compute frontier

▲

fredoliveira 3 days ago | parent | prev [-]

I think that the performance unlock from ramping up RL (RLVR specifically) is not fully priced into the current generation yet. Could be wrong, and people closer to the metal will know better, but people I talk to still feel optimistic about the next couple of years.

▲

derefr 3 days ago | parent | prev | next [-]

> Anecdotally moving from model to model I'm not seeing huge changes in many use cases.

Probably because you're doing things that are hitting mostly the "well-established" behaviors of these models — the ones that have been stable for at least a full model-generation now, that the AI bigcorps are currently happy keeping stable (since they achieved 100% on some previous benchmark for those behaviors, and changing them now would be a regression per those benchmarks.)

Meanwhile, the AI bigcorps are focusing on extending these models' capabilities at the edge/frontier, to get them to do things they can't currently do. (Mostly this is inside-baseball stuff to "make the model better as a tool for enhancing the model": ever-better domain-specific analysis capabilities, to "logic out" whether training data belongs in the training corpus for some fine-tune; and domain-specific synthesis capabilities, to procedurally generate unbounded amounts of useful fine-tuning corpus for specific tasks, ala AlphaZero playing unbounded amounts of Go games against itself to learn on.)

This means that the models are getting constantly bigger. And this is unsustainable. So, obviously, the goal here is to go through this as a transitionary bootstrap phase, to reach some goal that allows the size of the models to be reduced.

IMHO these models will mostly stay stable-looking for their established consumer-facing use-cases, while slowly expanding TAM "in the background" into new domain-specific use-cases (e.g. constructing novel math proofs in iterative cooperation with a prover) — until eventually, the sum of those added domain-specific capabilities will turn out to have all along doubled as a toolkit these companies were slowly building to "use models to analyze models" — allowing the AI bigcorps to apply models to the task of optimizing models down to something that run with positive-margin OpEx on whatever hardware that would be available at that time 5+ years down the line.

And then we'll see them turn to genuinely improving the model behavior for consumer use-cases again; because only at that point will they genuinely be making money by scaling consumer usage — rather than treating consumer usage purely as a marketing loss-leader paid for by the professional usage + ongoing capital investment that that consumer usage inspires.

▲

Workaccount2 3 days ago | parent | next [-]

>Mostly this is inside-baseball stuff to "make the model better as a tool for enhancing the model"

Last week I put GPT-5 and Gemini 2.5 in a conversation with each other about a topic of GPT-5's choosing. What did it pick?

Improving LLMs.

The conversation was far over my head, but the two seemed to be readily able to get deep into the weeds on it.

I took it as a pretty strong signal that they have an extensive training set of transformer/LLM tech.

	▲	temp0826 2 days ago \| parent [-]
		Like trying to have a lunch conversation with coworkers about anything other than work

▲

StephenHerlihyy 2 days ago | parent | prev | next [-]

My understanding is that model are already merely a confederation of many smaller sub-models being used as "tools" to derive answers. I am surprised that it took us this long to solve the "AI + Microservices = GOLD!" equation.

▲

kdmtctl 3 days ago | parent | prev [-]

You have just described a singularity point for this line of business. Which could happen. Or not.

	▲	derefr 3 days ago \| parent [-]
		I wouldn't describe it as a singularity point. I don't mean that they'll get models to design better model architectures, or come up with feature improvements for the inference/training host frameworks, etc. Instead, I mean that these later-generation models will be able to be fine-tuned to do things like e.g. recognizing and discretizing "feature circuits" out of the larger model NN into algorithms, such that humans can then simplify these algorithms (representing the fuzzy / incomplete understanding a model learned of a regular digital-logic algorithm) into regular code; expose this code as primitives/intrinsics the inference kernel has access to (e.g. by having output vectors where every odd position represents a primitive operation to be applied before the next attention pass, and every even position represents a parameter for the preceding operation to take); cut out the original circuits recognized by the discretization model, substituting simple layer passthrough with calls to these operations; continue training from there, to collect new, higher-level circuits that use these operations; extract + burn in + reference those; and so on; and then, after some amount of this, go back and re-train the model from the beginning with all these gained operations already being available from the start, "for effect." Note that human ingenuity is still required at several places in this loop; you can't make a model do this kind of recursive accelerator derivation to itself without any cross-checking, and still expect to get a good result out the other end. (You could, if you could take the accumulated intuition and experience of an ISA designer that guides them to pick the set of CISC instructions to actually increase FLOPS-per-watt rather than just "pushing food around on the plate" — but long explanations or arguments about ISA design, aren't the type of thing that makes it onto the public Internet; and even if they did, there just aren't enough ISAs that have ever been designed for a brute-force learner like an LLM to actually learn any lessons from such discussions. You'd need a type of agent that can make good inferences from far less training data — which is, for now, a human.)

▲

ACCount37 3 days ago | parent | prev | next [-]

The raw model scale is not increasing by much lately. AI companies are constrained by what fits in this generation of hardware, and waiting for the next generation to become available. Models that are much larger than the current frontier are still too expensive to train, and far too expensive to serve them en masse.

In the meanwhile, "better data", "better training methods" and "more training compute" are the main ways you can squeeze out more performance juice without increasing the scale. And there are obvious gains to be had there.

▲

robwwilliams 3 days ago | parent | next [-]

The jump to 1 million token length context for Sonnet 4 plus access to internet has been a game-changer for me. And somebody should remind Anthropic leadership to at least mirror Wikipedia; better yet support Wikipedia actively.

All of the big AI players have profited from Wikipedia, but have they given anything back, or are they just parasites on FOSS and free data?

▲

xnx 3 days ago | parent | prev [-]

> AI companies are constrained by what fits in this generation of hardware, and waiting for the next generation to become available.

Does this apply to Google that is using custom built TPUs while everyone else uses stock Nvidia?

▲

ACCount37 3 days ago | parent [-]

By all accounts, what's in Google's racks right now (TPU v5e, v6e) is vaguely H100-adjacent, in both raw performance and supported model size.

If Google wants anything better than that? They, too, have to wait for the new hardware to arrive. Chips have a lead time - they may be your own designs, but you can't just wish them into existence.

	▲	xxpor 3 days ago \| parent [-]
		Aren't chips + memory constrained by process + reticle size? And therefore, how much HBM you can stuff around the compute chip? I'd expect everyone to more or less support the same model size at the same time because of this, without a very fundamentally different architecture.

▲

gmadsen 3 days ago | parent | prev | next [-]

Its not clear to me that it needs to. If at the margins it can still provide an advantage in the market or national defense, then the spice must flow

	▲	duxup 3 days ago \| parent [-]
		I suspect it needs to if it is going to cover the costs of training.

▲

yieldcrv 3 days ago | parent | prev | next [-]

Locally run video models that are just as good as today’s closed models are going to be the watershed moment

The companies doing foundational video models have stakeholders that don’t want to be associated with what people really want to generate

But they are pushing the space forward and the uncensored and unrestricted video model is coming

▲

lynx97 3 days ago | parent | next [-]

Maybe. The question is, will legislation be fast enough? Maybe, if people keep going for politician porn: https://www.theguardian.com/world/2025/aug/28/outrage-in-ita...

▲

kaashif 3 days ago | parent [-]

Well considering it has been possible to produce similar doctored images for decades at this point, I think we can conclude legislation has not been fast enough.

That article is nothing to do with AI, really.

	▲	yieldcrv 2 days ago \| parent [-]
		and people focus way too much much on superimposed images instead of completely new digital avatars, which is what’s already taking off now

▲

giancarlostoro 3 days ago | parent | prev | next [-]

Nobody wants to make a commercial NSFW model that then suffers a jailbreak... for what is the most illegal NSFW content.

▲

yieldcrv 3 days ago | parent | next [-]

Thats the thing, what’s “illegal” will challenge our whole society when it comes do dynamically generated real interactive avatars that are new humans

When it comes to sexually explicit content in general with adults, all of our laws rely on the human actor existing

FOSTA and SESTA is related to user generated content of humans, for example. They rely on making sure an actual human isnt being exploited and burdening everyone with that enforcement. When everyone can just say “thats AI” nobody’s going to care and platforms will be willing to take that risk of it being true again - or a new hit platform will. That kind of content currently Doesnt exist in large quantities yet, until a video model ungimped can generate it.

Concerns about trafficking only rely on actual humans not entirely new avatars

regarding children there are more restrictions that may already cover this, there is a large market for just adult looking characters though and worries about underage can be tackled independently. or be found entirely futile. not my problem, focus on what you can control. this is whats coming though.

people already dont mind parasocial relationships with generative AI and already pay for that, just add nudity

▲

tick_tock_tick 2 days ago | parent | prev | next [-]

It's going to be really weird when huge swaths of the internet are illegal to visit outside the USA because you keep running into that kind of AI generate "content".

▲

simianwords 2 days ago | parent | prev [-]

Why is this illegal btw? I mean whats stopping an AI company from releasing a proper NSFW model? I hope it doesn't happen but I want to know what prevents them from doing it now.

▲

baq 2 days ago | parent [-]

in some jurisdictions generating a swastika or a hammer and sickle is illegal.

that said, I'm sure you can imagine that the really illegal, truly, positively sickening and immoral stuff is children-adjacent and you can be 100% sure there are sociopaths doing training runs for the broken people who'll buy the weights.

▲

simianwords 2 days ago | parent [-]

Is it illegal to use mspaint to generate similar vile things?

	▲	Majromax 2 days ago \| parent \| next [-]
		Not in the United States, but it is illegal in some jurisdictions. Additionally, the entire "payment processors leaning on Steam" thing shows that it might be very difficult to monetize a model that's known for generating extremely controversial content. Without monetization, it would be hard for any company to support the training (and potential release) of an unshackled enterprise-grade model.
	▲	tick_tock_tick 2 days ago \| parent \| prev [-]
		Most of Europe doesn't really have free speech, frankly most of the world doesn't. Privileges like making mspaint drawings of nearly whatever you want is pretty uniquely American.

▲

xenobeb 2 days ago | parent | prev [-]

The problem is the video models are only impressive in news stories about the video models. When you actually try to use them you can see how the marketing is playing to people's imagination because they are such a massive disappointment.

	▲	xnx 2 days ago \| parent [-]
		Not my experience. Have you used Veo 3?

▲

wslh 3 days ago | parent | prev | next [-]

> Anecdotally moving from model to model I'm not seeing huge changes in many use cases. I can just pick an older model and often I can't tell the difference...

Model specialization. For example a model with legal knowledge based on [private] sources not used until now.

▲

dvfjsdhgfv 3 days ago | parent | prev | next [-]

> I can just pick an older model and often I can't tell the difference...

Or, as in the case of a leading North American LLM provider, I would love to be able to choose an older model but it chooses it for me instead.

▲

darepublic 3 days ago | parent | prev | next [-]

I hope you're right.

▲

ljlolel 3 days ago | parent | prev [-]

The scaling laws already predict diminishing in returns

▲

DebtDeflation 3 days ago | parent | prev | next [-]

The wildest part is that the frontier models have a lifespan of 6 months or so. I don't see how it's sustainable to keep throwing this kind of money at training new models that will be obsolete in the blink of an eye. Unless you believe that AGI is truly just a few model generations away and once achieved it's game over for everyone but the winner. I don't.

▲

jononor 3 days ago | parent | next [-]

It is being played like a winner-takes-it-all right now (it may or may not be such a market). So it is a game of being the one that is left standing, once the others fall off. In this kind of game, speeding more is done as a strategy to increase the chances of other competitors running out of cash or otherwise hitting a wall. Sustainability is the opposite of the goal being pursued... Whether one reaches "AGI" is not considered important either, as long as one can starve out most competitors.

And for the newcomers, the scale needs to be bigger than what the incumbents (Google and Microsoft) have as discretionary spending - which is at least a few billion per year. Because at that rate, those companies can sustain it forever and would be default winners. So I think yearly expenditure is going to be 20B year++

▲

leptons 3 days ago | parent | next [-]

It's the Uber business plan - losing money until the competition loses more and goes out of business. So far Lyft seems to be doing okay, which proves the business plan doesn't really work.

▲

jononor 3 days ago | parent | next [-]

Uber market cap makes places it in the top100 in the world, whereas Lyft is around 1/25th of Uber in market cap, and not even top1000. I would consider that a success... That is basically as much winner-takes-it-all one can realistically get in a global market. Cases where the top is just 5x the runner up would still be very winner oriented.

	▲	ViewTrick1002 2 days ago \| parent [-]
		And in Europe Bolt is winning in many markets. Taxi apps are a commodity today.

▲

Workaccount2 3 days ago | parent | prev | next [-]

There are endless examples of that business model working...

	▲	oblio 2 days ago \| parent [-]
		Are there? Which ones? I'm especially interested in companies that weren't built to be sold.

▲

simianwords 2 days ago | parent | prev [-]

Uber is profitable so why do you think it doesn't work?

▲

oblio 2 days ago | parent [-]

Because the competition hasn't gone out of business (at least outside the US where tons of other ride hailing apps are available in most major locales) and because 16 (SIXTEEN!!!) years after founding Uber is still net profit negative: over its lifetime it has lost more money than it made.

The only people that really benefited from Uber are:

- Uber executives

- early investors that saw the share price go up

- early customers that got VC subsidized rides

▲

simianwords 2 days ago | parent [-]

Are you predicting that they can't be net profitable?

▲

oblio 2 days ago | parent [-]

No, I'm predicting that:

1. opportunity costs are a thing.

2. if you add Uber's financial numbers since creation, the crazy amount of VC that was invested Uber would have provided better returns by investing it in the S&P 500.

3. Uber will settle in as a boring, profitable company that's going to be a side note in both the history of tech and also of transportation and will primarily be remembered for eroding worker rights.

▲

simianwords 2 days ago | parent [-]

I don't get your point. You would have still made more money investing in Uber than in S&P.

▲

oblio 2 days ago | parent [-]

No, you wouldn't have, unless you were one of handful VCs or Uber execs (ok, and a bunch of pre-IPO Uber employees).

Uber IPO May 2019: market cap $82bn. Uber now: $193bn. 2.35x multiplier.

S&P 500 May 2019: $2750. S&P 500 now: $6460. 2.35x multiplier.

So the much, much riskier Uber investment has barely matched a passive S&P 500 investment over the same time frame. And the business itself has lost money, more money was put into it than has been gotten back so far.

I'm not even sure why I'm in this conversation as it seems ideological. I bring up facts and you bring up... vibes?

▲

simianwords 2 days ago | parent [-]

Let me get this straight.

I was replying to this: "So far Lyft seems to be doing okay, which proves the business plan doesn't really work." when I said Uber is profitable

Your retort to that was S&P grew more than Uber, which is a nonsensical argument. Our standard for what is a good business is if it grows faster than S&P after going public?

Edit: I dug up some research related to this, most companies do worse than S&P after becoming public. What's your point then?

	▲	nly 12 hours ago \| parent [-]
		Most people can't invest in a company pre-IPO, so it's irrelevant The same is currently true of Anthropic.

▲

sdesol 3 days ago | parent | prev [-]

> So it is a game of being the one that is left standing

Or the last investor. When this type of money is raised, you can be sure the earlier investors are looking for ways to have a soft landing.

▲

tim333 2 days ago | parent [-]

I'm not sure many investors are investing their own money. They are investing other people's money, maybe owned by shareholders of large companies in turn owned by our pension funds.

▲

sdesol 2 days ago | parent [-]

It might not be their money, but they are paid a management fee and if they cannot provide some return, people will stop using them.

	▲	tim333 19 hours ago \| parent [-]
		The kind of thing that happens is Joe Bloggs runs the Fidelity Hot Tech fund, up 50% over the last three years. Then when it crashes that's closed and Joe is switched to the Fidelity Safe Income fund with no down years for the last five years.

▲

solomonb 3 days ago | parent | prev [-]

They are only getting deprecated this fast because the cost of training is in some sense sustainable. Once it is not, then they will no longer be deprecated so fast.

	▲	utyop22 2 days ago \| parent [-]
		Is it though? Its only sustainable to the extent that there is easy access to funding on favourable terms...

▲

andrewgleave 2 days ago | parent | prev | next [-]

> “There's kind of like two different ways you could describe what's happening in the model business right now. So, let's say in 2023, you train a model that costs 100 million dollars. > > And then you deploy it in 2024, and it makes $200 million of revenue. Meanwhile, because of the scaling laws, in 2024, you also train a model that costs a billion dollars. And then in 2025, you get $2 billion of revenue from that $1 billion, and you spend $10 billion to train the model. > > So, if you look in a conventional way at the profit and loss of the company, you've lost $100 million the first year, you've lost $800 million the second year, and you've lost $8 billion in the third year. So, it looks like it's getting worse and worse. If you consider each model to be a company, the model that was trained in 2023 was profitable.” > ... > > “So, if every model was a company, the model is actually, in this example, is actually profitable. What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's like much more expensive and requires much more upfront R&D investment. And so, the way that it's going to shake out is this will keep going up until the numbers go very large, the models can't get larger, and then it will be a large, very profitable business, or at some point, the models will stop getting better. > > The march to AGI will be halted for some reason, and then perhaps it will be some overhang, so there will be a one-time, oh man, we spent a lot of money and we didn't get anything for it, and then the business returns to whatever scale it was at.” > ... > > “The only relevant questions are, at how large a scale do we reach equilibrium, and is there ever an overshoot?”

From Dario’s interview on Cheeky Pint: https://podcasts.apple.com/gb/podcast/cheeky-pint/id18210553...

▲

nradov 3 days ago | parent | prev | next [-]

That's why wealthy investors connected to the AI industry are also throwing a lot of money into power generation startups, particularly fusion power. I doubt that any of them will actually deliver commercially viable fusion reactors but hope springs eternal.

▲

vrt_ 3 days ago | parent | next [-]

Imagine solving energy as a side effect of this compute race. There's finally a reason for big money to be invested into energy infrastructure and innovation to solve a problem that can't be solved with traditional approaches.

	▲	bobsmooth 3 days ago \| parent [-]
		I would trade the destruction of trustworthy information and images on the internet for clean fusion power. It's a steep cost but I think it's worth it.

▲

mapt 3 days ago | parent | prev [-]

Continuing to carve out economies of scale in battery + photovoltaic for another ten doublings has plenty of positive externalities.

The problem is that in the meantime, they're going to nuke our existing powergrid, created in the 1920's to 1950's to serve our population as it was in the 1970's, and for the most part not expanded since. All of the delta is in price-mediated "demand reduction" of existing users.

▲

UltraSane 3 days ago | parent [-]

A lot of the biggest data centers being built are also building behind the meter generation dedicated to them.

▲

Workaccount2 3 days ago | parent [-]

Which is mostly natural gas sadly.

	▲	UltraSane 2 days ago \| parent [-]
		Yep they are tapping directly into main pipelines.

▲

cjbgkagh 2 days ago | parent | prev | next [-]

That’s like being upset that you can’t dig your own suez canal.

So long as there is competition it’ll be available at marginal cost. And there is plenty of innovation that can be done on the edges, and not all of machine learning is LLMs.

▲

mlyle 2 days ago | parent [-]

> So long as there is competition it’ll be available at marginal cost.

Most things are not perfect competition, so you get MR=MC not P=MC.

We're talking about massive capital costs. Another name for massive capital costs are "barriers to entry".

▲

cjbgkagh 2 days ago | parent [-]

Granted that capital costs are a barrier to entry and that barriers to entry leads to non-perfect competition, but the exploitability is limited in the case of LLMs because they exist on a sub-linear utility scale. In LLMs 2x the price is not 2x as useful, this means a new entrant can enter the lower end of the market and work their way up. The only way to prevent that is for the incumbent to keep costs as close to marginal as possible.

There is a natural monopoly aspect given the ability to train and data mine on private usage data but in general improvements in the algorithms and training seem to be dominating advancements. Microsoft's search engine Bing paid an absolute fortune for access to usage data and they were unable to capitalize on it. LLMs have the unusual property that a lot of value can be extracted out of fine tuning for a specialized purposes which opens the door to a million little niches providing fertile ground for future competitors. This is one area where being a fast follower makes a lot of sense.

▲

mlyle 2 days ago | parent [-]

Almost anything has a utility scale which is diminishing. But we still see MR=MC pricing in industries with barriers to entry (IPR, capital costs). TSMC and Mercedes don't price cheap to avoid giving others a toehold.

> There is a natural monopoly aspect given the ability to train and data mine on private usage data but in general improvements in the algorithms and training seem to be dominating advancements.

There's pretty big economies of scale with inference-- the magic of how to route correctly with experts to conduct batching while keeping latency low. It's an expensive technology to create, and there's a large minimum scale where it works well.

▲

cjbgkagh a day ago | parent [-]

I’m unconvinced that the lessons learned from scaling will constitute much of a moat. There is certainly an incentive for incumbents to give such an impression.

▲

mlyle 16 hours ago | parent [-]

Probably not. But we don't tend to see P=MC where there's any differentiation or barrier to entry, and I do not believe that AI is fully commoditized or will be soon.

▲

cjbgkagh 14 hours ago | parent [-]

Perhaps my 'marginal cost' should have been taken to be 'near marginal cost' and as such I don't believe it'll be fully commoditized either, just mostly commoditized in that it becomes impossible to extract meaningful monopolistic rents. Similarly I don't believe in perfect market efficiency so nothing would be exactly at marginal cost. At the moment we have investment subsidizing use so often these APIs are available at below marginal cost - the old adage 'we lose money on every sale but make up for it in volume'. I'm not confident that these investors on average will be able to make their money back plus a required rate of return and for many it's probably not the primary point of investing in this industry. If you have a $1B dollars to spare you too can light it on fire...

▲

mlyle 14 hours ago | parent [-]

I don't believe the APIs are actually under marginal cost. Inference is cheap, but people are using a lot of it.

The problem is, the arguments you're making could be used for almost any industry, including ones that we've seen sustained excess profits.

	▲	cjbgkagh 13 hours ago \| parent [-]
		It depends on how you define sustained and define excess profits. Being an unusually effective and well managed company can certainly yield sustained above average returns but that hardly means that is a meaningful barrier to entry - the question at hand. Ozempic is marked up 200x (20,000%) because they're able to extract monopolistic rents, that's a completely different ballpark of 2x or even 5x markups. In a fully commoditized industry participants yield average ROIs, in a mature market it's industry dependent but I would consider ranges from 10% to 100% above average ROIs to be reasonably normal. It's when things get to 10x to > 100x that I would consider to be able to extract monopolistic rents. I know I'm mixing up profits (ROIs) with markups on COGs with the issue being that companies extracting monopolistic rents tend to obscure the fact with padded expenses and the financial details needed to calculate Total Production Cost per Unit are generally not available.

▲

docdeek 3 days ago | parent | prev | next [-]

> The compute moat is getting absolutely insane. We're basically at the point where you need a small country's GDP just to stay in the game for one more generation of models.

For what it is worth, $13 billion is about the GDP of Somalia (about 150th in nomimal GDP) with a population of 15 million people.

	▲	Aeolun 3 days ago \| parent \| next [-]
		As a fun comparison, because I saw the population is more or less the same. The GDP of the Netherlands is about $1.2 trillion with a population of 18 million people. I understand that that’s not quite what’s meant with ‘small country’ but in both population and size it doesn’t necessarily seem accurate.
	▲	Aurornis 2 days ago \| parent \| prev [-]
		Country scale is weird because it has such a large range. California (where Anthropic is headquartered) has over twice as many people as all of Somalia. The state of California has a GDP of $4.1 Trillion. $13 billion is a rounding error at that scale. Even the San Francisco Bay Area alone has around half as many people as Somalia.

▲

powerapple 2 days ago | parent | prev | next [-]

Also not all compute was necessary for the final model, a large chunk of it is trial and error research. In theory, for $1B you spent training the latest model, a competitor will be able to do it after 6 months with $100M.

▲

SchemaLoad 2 days ago | parent [-]

Not only are the actual models rapidly devaluing, the hardware is too. Spend $1B on GPUs and next year there's a much better model out that's massively devalued your existing datacenter. These companies are building mountains of quicksand that they have to constantly pour more cash on else they be reduced to having no advantage rapidly.

▲

utyop22 2 days ago | parent | next [-]

Yes indeed if we look at it from this equation:

FCFF = EBIT(1-t) - Reinvestment

If the hardware needs constant replacement, that Reinvestment number will always remain higher than what most people think.

In fact, it seems none of these investments are fixed. Therefore there are no economies of scale (as it stands right now).

▲

chermi a day ago | parent | prev [-]

Ignoring energy costs(!), I'm interested in the following. Say every server generation from nvda is 25% "better at training", by whatever metric (1). Could you not theoretically wire together 1.25 + delta more of the previous generation to get the same compute? The delta accounts for latency/bandwidth from interconnects. I'm guessing delta is fairly large given my impression of how important HBM and networking are.

I don't know the efficiency gains per generation, but let's just say to get the same compute with this 1.25+delta system requires 2x energy. My impression is that while energy is a substantial cost, the total cost for a training run is still dominated by the actual hardware+infrastructure.

It seems like there must be some break even point where you could use older generation servers and come out ahead. Probably everyone has this figured out and consequently the resale value of previous gen chips is quite high?

What's the lifespan at full load of these servers? I think I read coreweave deprecates them (somewhat controversially) over 4 years.

Assuming the chips last long enough, even if they're not usable for LLM training/serving inference, can't they be reused for scientific loads? I'm not exactly old, but back in my PhD days people were building our own little GPU clusters for MD simulations. I don't think long MD simulations are the best use of compute these days, but there's many similar problems like weather modeling, high dimensional optimization problems, materials/radiation studies, and generic simulations like FEA or simply large systems of ODEs.

Are these big clusters being turned into hand-me-downs for other scientific/engineering problems like above, or do they simply burn them out? What's a realistic expected lifespan for a B200? Or maybe it's as simple as they immediately turn their last gen servers over to serve inference?

Lot of questions, but my main question is just how much the hardware is devalued once it becomes previous gen. Any guidance/references appreciated!

Also, anyone still in the academic computing world, do people like de shaw still exist trying to run massive MD simulations or similar? Do the big national computing centers use the latest greatest big Nvidia AI servers or something a little more modest? Or maybe even they're still just massive CPU servers?

While I have anyone who might know, whatever happened to that fad from 10+ years ago saying a lot of compute/algorithms would be shifting toward more memory-heavy models(2). Seems like it kind of happened in AI at least.

(1) Yes I know it's complicated, especially with memory stuff.

(2) I wanna say it was ibm Almaden championing the idea.

	▲	SchemaLoad a day ago \| parent [-]
		I'm not the one building out datacenters but I believe the power consumption is the reason for the devaluation. It's the same reasons we saw bitcoin miners throw all their ASICs in the bin every 6 months. At some point it becomes cheaper to buy new hardware than to keep running the old inefficient chips, when the power savings of new chips exceed the purchase price of the new hardware. These AI data centers are chewing up unimaginable amounts of power, so if nvidia releases a new chip that does the same work in half the power consumption. That whole datacenter of GPUs is massively devalued. The whole AI industry is looking like there won't be a first movers advantage, and if anything there will be a late mover advantage when you can buy the better chips and skip burning money on the old generations.

▲

AlienRobot 3 days ago | parent | prev | next [-]

I saw a story posted on reddit that U.S. engineers went to China and said the U.S. would lose the A.I. game because THE ENERGY GRID was much worse than China's.

That's just pure insanity to me.

It's not even Internet speed or hardware. It's literally not having enough electricity. What is going on with the world...

▲

ipython 2 days ago | parent [-]

Not to mention water for cooling. Large data centers can use 1 million+ gallons per day.

▲

xnx 2 days ago | parent [-]

1 million gallons is approximately 0.5 seconds of flow of the Columbia river.

▲

wiredpancake 2 days ago | parent [-]

It means nothing when most water is recycled anyways. It's not like the GPUs actually drink the stuff, the water just connects to heatsinks and is cycled around.

	▲	xnx 2 days ago \| parent [-]
		That's true for closed loop systems, but some data enters use evaporative cooling because it is more energy efficient.

▲

worldsayshi 3 days ago | parent | prev | next [-]

And we're still sort of on the fence if it's even that useful?

Like sure it saves me a bit of time here and there but will scaling up really solve the reliability issues that is the real bottleneck.

▲

bravetraveler 3 days ago | parent | next [-]

Assuming the best case: we're going to need to turn this productivity into houses or lifestyle improvement, soon... or I'm just going out with Sasquatch

▲

worldsayshi 2 days ago | parent [-]

While decoding your comment I'm going to assume Sasquatch to be a semi-underground (no web site, only calls) un-startup that specializes in survival kits for people leaving civilization behind. Like calling the vacuum repair store but more hippie themed.

▲

bravetraveler 2 days ago | parent [-]

That'll do :) edit: I assure you, there will still be a van

	▲	worldsayshi 2 days ago \| parent [-]
		Solar powered e-van? I found this now: https://soleva.org

▲

SchemaLoad 2 days ago | parent | prev | next [-]

I feel like it's pretty settled that they are a little bit useful, as a faster search engine, or being able to automatically sort my emails. But the value is nowhere near justifying the investment.

▲

mountainriver 2 days ago | parent | prev [-]

Off software codegen alone it is beyond useful

▲

jayd16 3 days ago | parent | prev | next [-]

In this imaginary timeline where initial investments keep increasing this way, how long before we see a leak shutter a company? Once the model is out, no one would pay for it, right?

▲

jsheard 3 days ago | parent | next [-]

Whatever happens if/when a flagship model leaks, the legal fallout would be very funny to watch. Lawyers desperately trying to thread the needle such that training on libgen is fair use, but training on leaked weights warrants the death penalty.

▲

marcosdumay 3 days ago | parent | prev | next [-]

In this imaginary reality where LLMs just keep getting better and better, all that a leak means is that you will eat-up your capital until you release your next generation. And you will want to release it very quickly either way, and should have a problem for a few months at most.

And if LLMs don't keep getting qualitatively more capable every few months, that means that all this investment won't pay off and people will soon just use some open weights for everything.

▲

wmf 3 days ago | parent | prev | next [-]

You can't run Claude on your PC; you need servers. Companies that have that kind of hardware are not going to touch a pirated model. And the next model will be out in a few months anyway.

▲

jayd16 3 days ago | parent [-]

If it was worth it, you'd see some easy self hostable package, no? And by definition, its profitable to self host or these AI companies are in trouble.

▲

serf 2 days ago | parent | next [-]

I think this misunderstands the scale of these models.

And honestly I don't think a lot of these companies would turn a profit on pure utility -- the electric and water company doesn't advertise like these groups do; I think that probably means something.

	▲	jayd16 2 days ago \| parent [-]
		What's the scale for inference? Is it truly that immense? Can you ballpark what you think would make such a thing impossible? > the electric and water company doesn't advertise like these groups do I'm trying to understand what you mean here. In the US these utilities usually operate in a monopoly so there's no point in advertising. Cell service has plenty of advertising though.

▲

tick_tock_tick 2 days ago | parent | prev | next [-]

You need a 100+gigs ram and a top of the line GPU to run legacy models at home. Maybe if you push it that setup will let you handle 2 people maybe 3 people. You think anyone is going to make money on that vs $20 a month to anthropic?

▲

lelanthran 2 days ago | parent | next [-]

> You need a 100+gigs ram and a top of the line GPU to run legacy models at home. Maybe if you push it that setup will let you handle 2 people maybe 3 people.

This doesn't seem correct. I run legacy models with only slightly reduced performance on 32GB RAM with a 12GB VRAM GPU right now. BTW, that's not an expensive setup.

> You think anyone is going to make money on that vs $20 a month to anthropic?

Why does it have to be run as a profit-making machine for other users? It can run as a useful service for the entire household, when running at home. After all, we're not talking about specialised coding agents using this[1], just normal user requests.

====================================

[1] For an outlay of $1k for a new GPU I can run a reduced-performance coding LLM. Once again, when it's only myself using it, the economics work out. I don't need the agent to be fully autonomous because I'm not vibe coding - I can take the reduced-performance output, fix it and use it.

▲

tick_tock_tick a day ago | parent | next [-]

Just your GPU not counting the rest of the system costs 4 years of subscription and with the sub you get the new models where your existing hardware will likely not be able to run it at all.

It's closer to $3k to build a machine that you can reasonable use which is 12 whole years of subscription. It's not hard to see why no one is doing it.

▲

lelanthran a day ago | parent [-]

> Just your GPU not counting the rest of the system costs 4 years of subscription

With my existing setup for non-coding tasks (GPU is a 3060 12GB which I bought prior to wanting local LLM inference, but use it now for that purpose anyway) the GPU alone was a once-off ~$350 cost (https://www.newegg.com/gigabyte-windforce-oc-gv-n3060wf2oc-1...).

It gives me literally unlimited requests, not pseudo-unlimited as I get from ChatGPT, Claude and Gemini.

> and with the sub you get the new models where your existing hardware will likely not be able to run it at all.

I'm not sure about that. Why wouldn't the new LLM models run on a 4yo GPU? Wasn't a primary selling point of the newer models being "They use less computation for inference"?

Now, of course there are limitations, but for non-coding usage (of which there is a lot) this cheap setup appears to be fine.

> It's closer to $3k to build a machine that you can reasonable use which is 12 whole years of subscription. It's not hard to see why no one is doing it.

But there are people doing it. Lots, actually, and not just for research purposes. With the costs apparently still falling, with each passing month it gets more viable to self-host, not less.

The calculus looks even better when you have a small group (say 3 - 5 developers) needing inference for an agent; then you can get a 5060ti with 16GB RAM for slightly over $1000. The limited RAM means it won't perform as well, but at that performance the agent will still capable of writing 90% of boilerplate, making edits, etc.

These companies (Anthropic, OpenAI, etc) are at the bottom of the value chain, because they are selling tokens, not solutions. When you can generate your own tokens continuously 24x7, does it matter if you generate at half the speed?

▲

tick_tock_tick a day ago | parent [-]

> does it matter if you generate at half the speed?

Yes, massively it's not even linear 1/2 speed is probably 1/8 or less the value of "full speed". It's going to be even more pronounced as "full speed" gets faster.

	▲	lelanthran 21 hours ago \| parent [-]
		> Yes, massively it's not even linear 1/2 speed is probably 1/8 or less the value of "full speed". It's going to be even more pronounced as "full speed" gets faster. I don't think that's true for most use-cases (content generation, including artwork, code/software, reading material, summarising, etc). Something that takes a day without an LLM might take only 30m with GPT5 (artwork), or maybe one hour with Claude Code. Does the user really care that their full-day artwork task is now one hour and not 30m? Or that their full-day coding task is now only two hours, and not one hour? After all, from day one of the ChatGPT release, literally no one complained that it was too slow (and it was much slower than it is now). Right now no one is asking for faster token generation, everyone is asking for more accurate solutions, even at the expense of speed.

▲

jayd16 2 days ago | parent | prev [-]

Plus, when you're hosting it yourself, you can be reckless with what you feed it. Pricing in the privacy gain, it seems like self hosting would be worth the effort/cost.

▲

jayd16 2 days ago | parent | prev | next [-]

Can you explain to me where Anthropic (or it's investors) expect to be making money if that's what it actually costs to run this stuff?

	▲	lelanthran 2 days ago \| parent [-]
		> Can you explain to me where Anthropic (or it's investors) expect to be making money if that's what it actually costs to run this stuff? Not the GP (in fact I just replied to GP, disagreeing with them), but I think that economies of scale kick in when you are provisioning M GPUs for N users and both M and N are large. When you are provisioning for N=1 (a single user), then M=1 is the minimum you need, which makes it very expensive per user. When N=5 and M is still 1, then the cost per user is roughly a fifth of the original single-user cost.

▲

2 days ago | parent | prev [-]

[deleted]

▲

quotemstr 2 days ago | parent | prev [-]

Does your "self hostable package" come with its own electric substation?

	▲	jayd16 2 days ago \| parent [-]
		You're saying that's needed for inference?

▲

fredoliveira 3 days ago | parent | prev | next [-]

> Once the model is out, no one would pay for it, right?

Well who does the inference at the scale we're talking about here? That's (a key part of) the moat.

▲

petesergeant 2 days ago | parent | prev | next [-]

gpt-oss-120b has cost OpenAI virtually all of my revenue, because I can pay Cerebras and Groq a fraction of what I was paying for o4-mini and get dramatically faster inference, for a model that passes my eval suite. This is to say, I think high-quality "open" models that are _good enough_ are a much bigger threat. Even more so since OpenRouter has essentially commoditized generation.

Each new commercial model needs to not just be better than the previous version, it needs to be significantly better than the SOTA open models for the bread-and-butter generation that I'm willing to pay the developer a premium to use their resources for generation.

▲

paganel 3 days ago | parent | prev [-]

There’s the opportunity cost here of those resources (and not talking only about the money) not being spent on power generating that actually benefits the individual consumer.

▲

derefr 3 days ago | parent | prev | next [-]

> privatization

You think any of these clusters large enough to be interesting, aren't authorized under a contractual obligation to run any/all submitted state military/intelligence workloads alongside their commercial workloads? And perhaps even to prioritize those state-submitted workloads, when tagged with flash priority, to the point of evicting their own workloads?

(This is, after all, the main reason that the US "Framework for Artificial Intelligence Diffusion" was created: America believed China would steal time on any private Chinese GPU cluster for Chinese military/intelligence purposes. Why would they believe that? Probably because it's what the US thought any reasonable actor would do, because it's what they were doing.)

These clusters might make private profits for private shareholders... but so do defense subcontractors.

▲

belter 2 days ago | parent | prev | next [-]

The AI story is over.

One more unimpressive release of ChatGPT or Claude, another 2 Billion spent by Zuckerberg on subpar AI offers, and the final realization by CNBC that all of AI right now...Is just code generators, will do it.

You will have ghost data centers in excess like you have ghost cities in China.

▲

Razengan 3 days ago | parent | prev | next [-]

Barely 50 years ago computers used to cost a million dollars and were less powerful than your phone's SIM card.

> GPT-4 training was what, $100M? GPT-5/Opus-4 class probably $1B+?

Your brain? Basically free *(not counting time + food)

Disruption in this space will come from whomever can replicate analog neurons in a better way.

Maybe one day you'll be able to Matrix information directly into your brain and know kung-fu in an instant. Maybe we'll even have a Mentat social class.

▲

jcranmer 2 days ago | parent | next [-]

> Barely 50 years ago computers used to cost a million dollars and were less powerful than your phone's SIM card.

Fifty years ago, we were starting to see the very beginning of workstations (not quite the personal computer of modern days), something like this: https://en.wikipedia.org/wiki/Xerox_Alto, which cost ~$100k in inflation-adjusted money.

▲

psychoslave 2 days ago | parent | prev [-]

Yeah, no hate for kung fu here, but maybe learning to better communicate together, act in ways that allows everyone to thrive in harmony and spread peace among all humanity might be a better thing to start incorporating, might not it?

▲

Razengan 2 days ago | parent [-]

It's literally a scene from The Matrix.

	▲	psychoslave a day ago \| parent [-]
		Yes it is. We can also maybe agree that the comment wasn't implying otherwise? I mean, it's like the djin giving you three whishes, and not a single character will ask "what's the two best wishes I can do to (ensure mankind will reach perpetually best peaceful harmonious flourishing social dynamics forever\| whatever goal the character might have as greatest hope)". When you have a instant perfect knowledge acquisition machine at disposal, the first thing to obviously understand is what the most important things to do to reach your goal. The film didn't mention everything Neo learned like that though, just that he accumulate straight forward for many hours. Wouldn't be an action movie, certainly you would hope the character first words after such an impressive feat wouldn't be "I know kung fu".

▲

willvarfar 3 days ago | parent | prev | next [-]

As humans don't actually work like LLMs do, we can surmise that there are far more efficient ways to get to AGI. We just need to find them.

▲

ijidak 3 days ago | parent [-]

Can you elaborate? The technology to build a human brain would cost billions in today’s dollars. Are you thinking moreso about energy efficiency?

	▲	robotresearcher 2 days ago \| parent \| next [-]
		We make hundreds of millions of brains a year for the cost of their parent’s food and shelter. That’s the known minimum cost. We have a lot of room to get costs down if we can figure out how.
	▲	xnx 2 days ago \| parent \| prev [-]
		> The technology to build a human brain would cost billions in today’s dollars I'm reminded of how insanely complex the human brain is: ~100 trillion connections. The Nvidia H100 has just 0.08 trillion transistors.

▲

maqp 3 days ago | parent | prev | next [-]

>You can have all the talent in the world but if you can't get 100k H100s and a dedicated power plant, you're out.

I really have to wonder, how long will it be before the competition moves into who has the most wafer-scale engines. I mean, surely the GPU is a more inefficient packaging form factor than large dies with on-board HBM, with a massive single block cooler?

▲

mfro 3 days ago | parent [-]

Sentiment I have heard is manufactories do not want to increase die size because defects per die increases at the same time.

	▲	Workaccount2 3 days ago \| parent \| next [-]
		Meanwhile at Cerebras...heh But I do believe that their cost per compute is still far more than disparate chips.
	▲	15155 2 days ago \| parent \| prev [-]
		This is why chiplets are used.

▲

me551ah 3 days ago | parent | prev | next [-]

And distillation makes the compute moat irrelevant. You could spend trillions to train a model, but some companies is going to get enough data from your model and distill it's own at a much cheaper upfront cost. This would allow them to offer them for cheaper inference cost too, totally defeating the point of spending crazy money on training.

▲

fredoliveira 3 days ago | parent [-]

A couple of counter-arguments:

Labs can just step up the way they track signs of prompts meant for model distillation. Distillation requires a fairly large number of prompt/response tuples, and I am quite certain that all of the main labs have the capability to detect and impede that type of use if they put their backs into it.

Distillation doesn't make the compute moat irrelevant. You can get good results from distillation, but (intuitively, maybe I'm wrong here because I haven't done evals on this myself) you can't beat the upstream model in performance. That means that most (albeit obviously not all) customers will simply gravitate toward the better performing model if the cost/token ratio is aligned for them.

Are there always going to be smaller labs? Sure, yes. Is the compute mote real, and does it matter? Absolutely.

	▲	serf 2 days ago \| parent [-]
		>Labs can just step up the way they track signs of prompts meant for model distillation. Distillation requires a fairly large number of prompt/response tuples, and I am quite certain that all of the main labs have the capability to detect and impede that type of use if they put their backs into it. ....while degrading their service for paying customers. This is the same problem as law-enforcement-agency forwarding threats and training LLMs to avoid user-harm -- it's great if it works as intended, but more often than not it throws a lot more prompt cancellations at actual users by mistake, refuses queries erroneously -- and just ruins user experience. i'm not convinced any of the groups can avoid distillation without ruining customer experience.

▲

matthewdgreen 2 days ago | parent | prev | next [-]

What’s the hardware capability doubling rate for GPUs in clusters? Or (since I know that’s complicated to answer for dozens of reasons): on average how many months has it been taking for the hardware cost of training the previous generation of models to halve, excluding algorithmic improvements?

▲

senko 3 days ago | parent | prev | next [-]

> We're basically at the point where you need a small country's GDP just to stay in the game for one more generation of models.

When you consider where most of that money ends up (Jensen &co), it's bizarre nobody can really challenge their monopoly - still.

▲

2OEH8eoCRo0 3 days ago | parent | prev | next [-]

A lot of moats are just money. Money to buy competition, capture regulation, buy exclusivity, etc.

▲

SilverElfin 2 days ago | parent | prev | next [-]

The other problem is that big companies can take a loss and starve out any competition. They already make a ton of money from various monopolies. And they do not have the distraction of needing to find funding continuously. They can just keep selling these services at a loss until they’re the only ones left. That’s leaving aside the advantages they have elsewhere - like all the data only they can access for training. For example, it is unfair that Google can use YouTube data, but no one else can. How can that be fair competition? And they can also survive copyright lawsuits with their money. And so on.

▲

ants_everywhere 2 days ago | parent | prev | next [-]

> What gets me is that this isn't even a software moat anymore - it's literally just whoever can get their hands on enough GPUs and power infrastructure.

I'm curious to hear from experts how much this is true if interpreted literally. I definitely see that having hardware is a necessary condition. But is it also a sufficient condition these days? ... as in is there currently no measurable advantage to having in-house AI training and research expertise?

Not to say that OP meant it literally. It's just a good segue to a question I've been wondering about.

▲

sidewndr46 3 days ago | parent | prev | next [-]

I'm not an expert at how private investment rounds work, but aren't most "raises" of AI companies just huge commitments of compute capacity? Either pre-existing or build-out.

▲

serf 2 days ago | parent [-]

it's difficult for me to imagine this level of compute existing and sitting there idle somewhere; it just doesn't make sense.

So we can at least assume that whoever is deciding to move the capacity does so at some business risk elsewhere.

	▲	sidewndr46 14 hours ago \| parent [-]
		I've stayed away from the hyperscalers but worked at places where requesting 400 servers for a task was normal and routine. Understanding scale is a weird thing, that I guess is psychological. I think the different experiences and travels I have made have an impact on this. Despite living for over a decade in Texas I live in one of the most densely populated places. But I recently got to visit Colorado, which is far less populated and has lots of weird places. You can drive up to the base of the Great Sand Dunes and walk up the first few hills quite easily if you're in good shape. Here's some photos https://www.hydrogen18.com/p/2024-great-sand-dunes-national-... If you pull out your smartphone and look at Google Maps, it becomes pretty obvious how insane the scale of the place is. There's no real prohibition on where you can walk there because it isn't necessary. It's so large and the environment is so harsh, you aren't going to cross much of it on foot. Ever.

▲

ath3nd 2 days ago | parent | prev | next [-]

> GPT-7 will need its own sovereign wealth fund

If the diminishing returns that we see now continue to prove true, ChatGPT6 will already be financially not viable so I doubt there will be GPT7 that can live up to the big version bump.

Many folks already consider GPT5 to be more like GTP4.1. I personally am very bearish on Anthropic and OpenAI.

▲

mikewarot 2 days ago | parent | prev | next [-]

Most of that power usage is moving data and weights into multiply accumulate hardware, then moving the data out. The actual computation is a fairly small fraction of the power consumed.

It's quite likely that an order of magnitude improvement can be had. This is an enormous incentive signal for someone to follow.

▲

madduci 3 days ago | parent | prev | next [-]

And just now came the email with the changes to their terms of usage and policy.

Nice timing? I am sure they have scored a deal with the selling of personal data

▲

scellus 3 days ago | parent | prev | next [-]

So far it doesn't seem like winner-take-all, and all the major players (OpenAI, Anthropic, xAI, Google, Meta?) are backed by strong partnerships and a lot of capital. It is capital-intensive this round though, so the primary producers are big and few. As long as they compete, benefits mostly go to other parties (= society) through increased productivity.

▲

ericmcer 3 days ago | parent | prev | next [-]

Could they vastly reduce this cost by specializing models? Like is a general know everything model exponentially more expensive than one that deeply understands a single topic (like programming, construction, astrophysics, whatever)?

Is there room for a smaller team to beat Anthropic/OpenAI/etc. at a single subject matter?

▲

delusional 2 days ago | parent | prev | next [-]

> The compute moat

Does this really describe a "most" ør are you just describing capital?

The capitalization is getting insane. Were basically at the point where you ned more capital than a small nations GDP.

That sounds mich more accurate to my ears, and much more troubling

▲

up2isomorphism 2 days ago | parent | prev | next [-]

There is no generational differences between these models. I tested cursors with all different backends and they are similar in most cases. So called race is just a Wall Street sensation to bump the stock price.

▲

protocolture 2 days ago | parent | prev | next [-]

>The compute moat is getting absolutely insane.

Is it?

Seems like theres a tiny performance gain between "This runs fine on my laptop" and "This required a 10B dollar data centre"

I dont see any moat, just crazy investment hoping to crack the next thing and moat that.

▲

BobbyTables2 2 days ago | parent | prev | next [-]

Until one day an outsider finds a new approach for LLMs that vastly reduces the computational complexity.

And then we’ll realize we wasted an entire Apollo space program to build an over-complicated autocompleter.

▲

noosphr 2 days ago | parent | prev | next [-]

My hope is that this hype cycle overbuilds nuclear power capacity so much that we end up using it to sequester carbon dioxide from the atmosphere once the bubble pops and electricity prices become negative for most of the day.

In the medium term China has so much spare capacity that they maybe be the only game in town for highend models, while the US will be trying to fix a grid with 50 years of deferred maintenance.

▲

tootie 2 days ago | parent | prev | next [-]

This is why Nvidia is the most valuable company in the world. Ultimately all these investment rounds for LLM companies are just going to be spent on Nvidia products.

▲

scottLobster 3 days ago | parent | prev | next [-]

Roughly 1% of US GDP in 2025 was data center construction, mostly for AI.

▲

illiac786 2 days ago | parent | prev | next [-]

I sincerely hope this whole LLM monetization scheme crashes and burns down on these companies.

I really hope we can get to a point where modest hardware will achieve similar results for most tasks and these insane amount of hardware will only be required for the most complex requests only, which will be rarer, thereby killing the business case.

I would dance the Schadenfreude Opus in C major if that became the case.

▲

huevosabio 3 days ago | parent | prev | next [-]

Instead of enriching uranium we're enriching weights!

▲

risyachka 3 days ago | parent | prev | next [-]

>> The compute moat is getting absolutely insane.

how so? deepseek and others do models on par with previous generation for a tiny fraction of a cost. Where is the moat?

▲

lz400 2 days ago | parent | prev | next [-]

That’s probably what the companies spending the money think, that they’re building a huge moat. There’s an alternative view. If there’s a bubble and all these companies are spending these huge sums on something that ends up not returning that much on that investment, and the models plateau and eventually smaller, cheaper, self-runnable open source versions get 90% of the way there, what’s going to happen to that moat? And the companies that over spent so much?

This article is a good example of the bear case https://www.honest-broker.com/p/is-the-bubble-bursting

▲

asveikau 3 days ago | parent | prev | next [-]

This sounds terrible for the environment.

▲

puchatek 3 days ago | parent | prev | next [-]

And how much will one query cost you once the companies start to try and make this stuff profitable?

▲

itronitron 2 days ago | parent | prev | next [-]

Hmm, I wonder how much bitcoin someone could mine with that amount of compute.

	▲	wiredpancake 2 days ago \| parent [-]
		A lot, but maybe a lot less than you expect. You'd be competing with ASIC miners, which are 100x more cost effective per MH/s. You don't need 100,000GB of VRAM when mining GPU, therefore its waste.

▲

xbmcuser 3 days ago | parent | prev | next [-]

This is why I keep harping on the world needing China to get competitive on node size and crashing the market. They are already making energy with solar and renewable practically free. So the world needs AI to get out of the hand of the rich few and into the hands of everyone

▲

sjapkee 2 days ago | parent | prev | next [-]

The biggest problem is that result doesn't worth spent resources

▲

rich_sasha 3 days ago | parent | prev | next [-]

It's the SV playbook: invent a field, make it indispensable, monopolise it and profit.

It still amazes me that Uber, a taxi company, is worth however many billions.

I guess for the bet to work out, it kinda needs to end in AGI for the costs to be worth it. LLMs are amazing but I'm not sure they justify the astronomical training capex, other than as a stepping stone.

▲

lotsofpulp 2 days ago | parent | next [-]

Why would a global taxi/delivery broker not be worth billions? Their most recent 10-Q says they broker 36 million rides or deliveries per day. Even profiting $1 on each of those would result in a company worth billions.

▲

simianwords 2 days ago | parent | prev [-]

SV playbook has been to make sustainable businesses. Uber makes profits, so do Google, Amazon and other big tech.

> LLMs are amazing but I'm not sure they justify the astronomical training capex, other than as a stepping stone.

They can just... stop training today and quickly recuperate the costs because inference is mostly profitable.

▲

rich_sasha 2 days ago | parent [-]

All these businesses looked incredibly unsustainable for a long time. Uber was a cash shredder. Amazon didn't turn a profit for years, IIRC. They became profitable essentially by becoming quasi-monopolies.

Indeed, LLM companies likely turn operating profits, but I'm not sure that alone justifies their valuations. It's one thing to make money, it's another to make a return for investors.

And sure, valuations are growing faster than you can blink. Time will show if this in turn is justifiable or a bubble.

▲

filoleg 2 days ago | parent [-]

Cannot speak for the rest, but the whole “Amazon didn’t turn a profit for years” (as an argument about their profitability now coming solely through quasi-monololy routes) is incredibly misleading and bordering on disingenuous.

Since before AWS was even a thing, Amazon was already turning up great revenue and could’ve easily just stopped expanding and investing into the company growth, and they would be profitable easily. Instead, Amazon decided to reinvest all their potential profits into growth/expansion (with the favorable tax treatment on top) at the expense of keeping the cash profits. At any given point, Amazon could’ve stopped reinvesting all potential profits into their growth, and they would be instantly profitable.

This is not the same as Uber, which ran their core service operations at a net loss (and was only cheap due to their investors eating the difference and hoping that Uber will eventually figure out how to not lose money on operating their core service).

	▲	rich_sasha 2 days ago \| parent [-]
		Ok, we can debate on Uber, but your take on Amazon is very similar to today's LLM providers. They too are making good revenue on their product, but put so much cash into growth that they at least appear to be running at a loss.

▲

throw310822 3 days ago | parent | prev | next [-]

Just in case, can they be repurposed for bitcoin mining? :)

Edit: for the curious, no. An H100 costs about ~25k and produces $1.2/day mining bitcoin. Without factoring in electricity.

	▲	wmf 3 days ago \| parent \| next [-]
		There are other coins that are less unprofitable to mine (see https://whattomine.com/gpus ) but it's probably still not worth it.
	▲	krupan 3 days ago \| parent \| prev [-]
		Before your edit I was going to answer, sadly no, they can't even be repurposed for Bitcoin mining.

▲

paulddraper 3 days ago | parent | prev | next [-]

Reductive.

Doesn’t explain Deepseek.

	▲	FergusArgyll 3 days ago \| parent [-]
		Deepseek story was way overblown. Read the gpt-oss paper, the actual training run is not the only expense. You have multiple experimental training runs as well as failed training runs. + they were behind SOTA even then

▲

lofaszvanitt 3 days ago | parent | prev [-]

Nvidia needs to grow.