Remix.run Logo
mmmllm 5 days ago

The same week Oracle is forecasting huge data center demand and the stock is rallying. If these 10x gains in efficiency hold true then this could lead to a lot less demand for Nvidia, Oracle, Coreweave etc

amelius 5 days ago | parent | next [-]

https://en.wikipedia.org/wiki/Jevons_paradox

mmmllm 5 days ago | parent [-]

Sure but where is the demand going to come from? LLMs are already in every google search, in Whatsapp/Messenger, throughout Google workspace, Notion, Slack, etc. ChatGPT already has a billion users.

Plus penetration is already very high in the areas where they are objectively useful: programming, customer care etc. I just don't see where the 100-1000x demand comes from to offset this. Would be happy to hear other views.

philipp-gayret 5 days ago | parent | next [-]

If LLMs were next to free and faster I would personally increase my consumption 100x or more and Im only the "programming" category.

vessenes 5 days ago | parent | prev | next [-]

We are nearly infinitely far away from saturating compute demand for inference.

Case in point; I'd like something that realtime assesses all the sensors and API endpoints of stuff in my home and as needed bubbles up summaries, diaries, and emergency alerts. Right now that's probably a single H200, and well out of my "value range". The number of people in the world that do this now at scale is almost certainly less than 50k.

If that inference cost went to 1%, then a) I'd be willing to pay it, and b) there'd be enough of a market that a company could make money integrating a bunch of tech into a simple deployable stack, and therefore c) a lot more people would want it, likely enough to drive more than 50k H200s worth of inference demand.

mtone 4 days ago | parent | next [-]

Do you really need a H200 for this? Seems like something a consumer GPU could do. Smaller models might be ideal [0] as they don't require extensive world knowledge and are much more cost efficient/faster.

Why can't you build this today?

[0]: https://arxiv.org/pdf/2506.02153 Small Language Models are the Future of Agentic AI (Nvidia)

OtherShrezzing 4 days ago | parent | prev | next [-]

Is all of that not achievable today with things like Google Home?

It doesn’t sound like you need to run a H200 to bridge the gap between what currently exists and the outcome you want.

mmmllm 4 days ago | parent | prev | next [-]

Sure but if that inference cost went to 1%, then Oracle and Nvidia's business model would be bust. So you agree with me?

taminka 5 days ago | parent | prev [-]

absolutely nobody wants or needs a fucking thermostat diary lmao, and the few ppl that do will have zero noticeable impact on world's compute demands, i'm begging ppl in on hn to touch grass or speak to an average person every now and then lol

pessimizer 4 days ago | parent | next [-]

You wouldn't even know that it existed, or how it worked. It would just work. Everybody wants hands off control that they don't have to think or learn about.

edit: this reminds me of a state agency I once worked for who fired their only IT guy after they moved offices, because the servers were running just fine without him. It was a Kafkaesque trauma for him for a moment, but a massive raise a week later when they were renegotiating for him to come back.

arscan 5 days ago | parent | prev [-]

its pretty easy to dispute and dismiss a single use case for indiscriminate/excessive use of inference to achieve some goal, as you have done here, but its hard to dispute every possible use case

jjcm 4 days ago | parent | prev | next [-]

As plenty of others have mentioned here, if inference were 100x cheaper, I would run 200x inference.

There are so many things you can do with long running, continuous inference.

sipjca 4 days ago | parent [-]

but what if you don't need to run it in the cloud

ukuina 4 days ago | parent [-]

You will ALWAYS want to use the absolute best model, because your time is more valuable than the machine's. If the machine gets faster or more capable, your value has jumped proportionally.

idopmstuff 5 days ago | parent | prev | next [-]

> Plus penetration is already very high in the areas where they are objectively useful: programming, customer care etc.

Is that true? BLS estimates of customer service reps in the US is 2.8M (https://www.bls.gov/oes/2023/may/oes434051.htm), and while I'll grant that's from 2023, I would wager a lot that the number is still above 2M. Similarly, the overwhelming majority of software developers haven't lost their jobs to AI.

A sufficiently advanced LLM will be able to replace most, if not all of those people. Penetration into those areas is very low right now relative to where it could be.

mmmllm 5 days ago | parent [-]

Fair point - although there are already so many customer facing chatbots using LLMs rolled out already. Zendesk, Intercom, Hubspot, Salesforce service cloud all have AI features built into their workflows. I wouldn't say penetration is near the peak but it's also not early stage at this point.

In any case, AI is not capable of fully replacing customer care. It will make it more efficient but the non-deterministic nature of LLMs mean that they need to be supervised for complex cases.

Besides, I still think even the inference demand for customer care or programming will be small in the grand scheme of things. EVERY Google search (and probably every gmail email) is already passed through an LLM - the demand for that alone is immense.

I'm not saying demand won't increase, I just don't see how demand increases so much that it offsets the efficiency gains to such an extent that Oracle etc are planning tens or hundreds of times the need for compute in the next couple of years. Or at least I am skeptical of it to say the least.

mirekrusin 5 days ago | parent | prev | next [-]

We've seen several orders of magnitude improvements in cpus over the years, yet you try to do anything now and interaction is often slower than that on zx spectrum. We can easily fill in order of magnitude improvement and that's only going to create more demand. We can/will have models thinking for us all the time, in parallel and bother us with findings/final solutions only. There is no limit here really.

theptip 4 days ago | parent | prev | next [-]

I’m already throughput-capped on my output via Claude. If you gave me 10x the token/s I’d ship at least twice as much value (at good-enough for the business quality, to be clear).

There are plenty of usecases where the models are not smart enough to solve the problem yet, but there is very obviously a lot of value available to be harvested from maturing and scaling out just the models we already have.

Concretely, the $200/mo and $2k/ mo offerings will be adopted by more prosumer and professional users as the product experience becomes more mature.

lanza 4 days ago | parent | prev | next [-]

The difference in usefulness between ChatGPT free and ChatGPT Pro is significant. Turning up compute for each embedded usage of LLM inference will be a valid path forward for years.

adgjlsfhk1 5 days ago | parent | prev | next [-]

The problem is that unless you have efficiency improvements that radically alter the shape of the compute vs smartness curve, more efficient compute translates to much smarter compute at worse efficiency.

amelius 5 days ago | parent | prev | next [-]

If you can make an LLM solve a problem but from 100 different angles at the same time, that's worth something.

mmmllm 5 days ago | parent [-]

Isn't that essentially how the MoE models already work? Besides, if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost?

Besides, this would only apply for very few use cases. For a lot of basic customer care work, programming, quick research, I would say LLMs are already quite good without running it 100X.

mcrutcher 5 days ago | parent | next [-]

MoE models are pretty poorly named since all the "experts" are "the same". They're probably better described as "sparse activation" models. MoE implies some sort of "heterogenous experts" that a "thalamus router" is trained to use, but that's not how they work.

amelius 5 days ago | parent | prev | next [-]

> if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost

The compute/intelligence curve is not a straight line. It's probably more a curve that saturates, at like 70% of human intelligence. More compute still means more intelligence. But you'll never reach 100% human intelligence. It saturates way below that.

eMPee584 4 days ago | parent [-]

how would you know it converges on human limits, why wouldn't it be able to go beyond, especially if it gets its own world sim sandbox?

amelius 4 days ago | parent [-]

I didn't say that. It converges well below human limits. That's what we see.

Thinking it will go beyond human limits is just wishful thinking at this point. There is no reason to believe it.

mirekrusin 5 days ago | parent | prev [-]

MoE is something different - it's a technique to activate just a small subset of parameters during inference.

Whatever is good enough now, can be much better for the same cost (time, computation, actual cost). People will always choose better over worse.

mmmllm 5 days ago | parent [-]

Thanks, I wasn't aware of that. Still - why isn't there a super expensive OpenAI model that uses 1,000 experts and comes up with way better answers? Technically that would be possible to build today. I imagine it just doesn't deliver dramatically better results.

Leynos 4 days ago | parent [-]

That's what GPT-5 Pro and Grok 4 Heavy do. Those are the ones you pay triple digit USD a month for.

takinola 4 days ago | parent | prev | next [-]

I mean 640KB should be enough for anyone too but here we are. Assuming LLMs fulfill the expected vision, they will be in everything and everywhere. Think about how much the internet has permeated everyday life. Even my freaking toothbrush has WiFi now! 1000x demand is likely several orders of magnitude too low in terms of the potential demand (again, assuming LLMs deliver on the promise).

sauwan 5 days ago | parent | prev [-]

Long running agents?

ls65536 5 days ago | parent | prev | next [-]

I'm not going to speculate about what might be ahead in regards to Oracle's forecasting of data center demand, but regarding the idea of efficiency gains leading to lower demand, don't you think something like Jevons paradox might apply here?

Voloskaya 5 days ago | parent | prev | next [-]

People said the same thing for deepseek-r1, and nothing changed.

If you come up with a way to make the current generation of models 10x more efficient, then everyone just moves to train a 10x bigger model. There isn’t a size of model where the players are going to be satisfied at and not go 10x bigger. Not as long as scaling still pays off (and it does today).

stingraycharles 5 days ago | parent | prev | next [-]

Absolutely not; the trends have proven that people will just pay for the best quality they can get, and keep paying roughly the same money.

Every time a new model is released, people abandon the old, lower quality model (even when it’s priced less), and instead prefer to pay the same for a better model.

The same will happen with this.

mmmllm 5 days ago | parent | next [-]

Sure but the money people are paying right now isn't that much in the grand scheme of things. OpenAI is expecting 13bn in revenue this year. AWS made over 100bn last year. So unless they pay a lot more, or they find customers outside of programmers, designers, etc who are willing to pay for the best quality, I don't see how it grows as fast as it needs to (I'm not saying it won't increase, just not at the rate expected by the data center providers)

clvx 5 days ago | parent | prev | next [-]

For early adopters yes but many systems have been running as good enough without any kind of updates for a long time. For many use cases it needs to get to a point where accuracy is good enough and then it will be set and forget. I disagree with the approach but that's what you find in the wild.

Zambyte 5 days ago | parent | prev [-]

The best quality you can get is at odds with the best speed you can get. There are lots of people (especially with specific use cases) who will pay for the best speed they can get that is high enough quality.

thinkingemote 5 days ago | parent | prev | next [-]

If someone had to bet on an AI crash which I imagine would led to unused datacentres and cheap GPUs how would they invest their winnings to exploit these resources?

CuriouslyC 5 days ago | parent | next [-]

If the price of inference drops through the floor all the AI wrapper companies become instantly more valuable. Cursor is living on borrowed time because their agents suck and they're coasting on first mover advantage with weak products in general, but their position would get much better with cheap inference.

sunir 4 days ago | parent | prev | next [-]

Buy the application layer near winners. When computing costs shrink, usage expands.

kridsdale3 5 days ago | parent | prev [-]

Assuming your question isn't rhetorical, massive Oracle Crypto Farm.

ACCount37 5 days ago | parent | prev | next [-]

No. The gains in inference and training efficiency are going to be absorbed by frontier LLM labs being more willing to push more demanding and capable models to the end users, increase reasoning token budgets, etc.

jstummbillig 5 days ago | parent | prev | next [-]

For the last 2 years, despite all efficiency gains, I am literally watching characters appear on my screen, as if this was a hacker movie. Lately, I am also waiting for at least 60s for anything to appear at all.

If that happened at 10x the speed, I would still be slow in computer terms, and that increasingly matter, because I will not be the one reading the stuff – it will be other computers. I think looking back a few years from now, every single piece of silicon that is planned right will look like a laudable but laughable drop in the ocean.

mdp2021 5 days ago | parent | prev [-]

The real quality demand needs is not there, so more processing is very probably needed, so efficiency gains may allow the extra processing.

(A string example read today of Real quality demand needs: the administration of Albania wants some sort of automated Cabinet Minister. Not just an impartial and incorruptible algorithm (what we normally try to do with deterministic computation): a "minister". Good luck with that.)