> Now if that means RAM prices come down (as speculated, not reported on, in the link) or the AI companies just do more things with their extra ram is yet to be determined.

I think it is determined:

https://en.wikipedia.org/wiki/Jevons_paradox

▲

woadwarrior01 a day ago | parent | next [-]

Yeah, even if one efficiency trick lands, people will end up spending the saved budget right back on bigger models, and/or more "thinking" tokens.

	▲	EthanHeilman a day ago \| parent [-]
		Not if the bigger models have diminishing returns. Lets say you figure out a way to reduce RAM requirements 100X, but 2x increasing RAM usage by 2x only gets you a 1% increase in effectiveness and 3x does not get you any noticeable increase over 2x at all. Sure you can reduce the price per token, but you might have already saturated the market. Even if you haven't saturated the market, your hardware based moat just got smaller and this is going to reduce your margins even more. Just noticed that pydry made a similar point: https://news.ycombinator.com/item?id=47574216

▲

a day ago | parent | prev | next [-]

[deleted]

▲

pydry a day ago | parent | prev [-]

Jevons paradox only applies if demand hasnt already been saturated.

The fact that public LLM usage is leveling off at a price of $0 and Jensen "we make the shovels in this gold rush" Huang is rather desperately claiming that you need to spend $250k/year in tokens to be taken seriously suggests that demand saturation may not be that far off.

Whether Jevons' Paradox applies to software engineers I think is another open question. Im constantly being told that it doesnt and that LLMs make half of us redundant now, but Im skeptical - so much automation I see is broken or badly done.

▲

raincole a day ago | parent | next [-]

It is quite hard to imagine how the demand is saturated now. I think any company that uses a sliver of AI will happily increase their token consumption 100x if it's free.

▲

flir a day ago | parent | next [-]

Are you assuming a brute force "burn tokens until it passes the tests" model, or is there a really sweet approach on the horizon that is impractical at current token costs?

I'm asking 'cos while I'm philosophically opposed to the first option, but I'd love to hear about anything that resembles the second.

▲

a day ago | parent | next [-]

[deleted]

▲

SpicyLemonZest a day ago | parent | prev [-]

One idea I've heard is prototype-first design reviews. If the cost of code genuinely trends to zero, there's no reason why most technical disagreements about product functionality couldn't come with prototypes to illustrate each side of the debate. Today, that's not always practical between token costs and usage limits.

▲

pydry 21 hours ago | parent [-]

What if the agent fucks up the better approach but does a good job of the worst approach?

	▲	SpicyLemonZest 20 hours ago \| parent [-]
		Then hopefully the reviewers will notice that the first prototype's flaws are correctable. Sometimes they won't, and they'll end up making a bad decision, just as they sometimes make bad decisions today with no prototypes to look at. But having prototypes allows for a lot of debates that are today vague and meandering to be reduced to "which of these assertions at the end of this integration test do you think is the correct behavior?".

▲

pydry a day ago | parent | prev [-]

Executive FOMO disease is being exploited by the model providers to push for maximal token usage even when it is pointless.

This includes encouraging people to set up elaborate multi model set ups (e.g. "gas town") for coding that do not meaningfully improve productivity but which certainly do cause token usage to explode.

It also includes encouraging execs to use token consumption as a proxy for productivity - almost akin to SLOC.

AI has a halo right now and the managerial class seem to be willing to forgive almost any failure because the promise is so enticing. We're at peak expectations right now. They will soon start to be less forgiving when the warts which are intrinsic to LLMs remain unsolved.

▲

monknomo a day ago | parent | next [-]

nobody know how to measure software productivity + ai is supposed to mean productivity goes up = more ai means more productivity

As best as I can tell, that's the thinking. It's one number, it's very easy to find and manage, and there is a belief that it directly measures productivity.

I disagree that it does; seems to me the throughput of useful features is a better measure, but I'm not in the drivers seat on this one

▲

irke a day ago | parent [-]

Incremental revenue and cost-savings, at least for enterprises, is where it would show up. There’s also a present value consideration - if LLM’s make those dollars come into existence closer to the present, they are worth more.

The personal use case stuff is messy and subjective.

	▲	monknomo a day ago \| parent [-]
		attributing incremental revenue to gross engineering effort is challenging, imo. Cost savings is primarily a function of headcount here. Which is also easy to measure, and so if we take my thesis that easy to measure stuff is prioritized...

▲

irke a day ago | parent | prev [-]

Yep - it’s impossible to separate experimental tokens vs value creating ones.

Ultimately the performance will be assessed via the income statement and cash flows of customers of the model producers.

Frankly in the window pre-IPO it’s in the best interests of OAI et al to show a line going to the top-right in relation to tokens, in their prospectus. What does that mean?

Strategic manipulation.

▲

veunes 7 hours ago | parent | prev | next [-]

Demand is stagnating only applies to the B2C segment, where people are already bored of generating poems and funny pictures. In B2B, the demand hasn't even started yet because corporations are still terrified of shoving their NDA data into public APIs. The second local models and secure private clouds get cheaper, the enterprise is going to devour literally any amount of available compute just to automate internal document workflows

▲

Marha01 a day ago | parent | prev | next [-]

Demand for top models is definitely not saturated, at least when it comes to programming. If I could afford to use 5x more Claude Opus 4.6 tokens, I would!

▲

hajile a day ago | parent | next [-]

Demand is relative. How many Claude tokens would you buy if they had a 10x price hike?

The market has achieved it's current saturation level with loss-leader prices that remind me of the Chinese bike share bubble[0]. Once those prices go up to break even levels (let alone profitable levels), the number of people who can afford to pay will go down dramatically (and that's not even accounting for the bubble pop further constricting people's finances).

[0] https://www.youtube.com/watch?v=FQrEDq8KPiU

▲

pigpop a day ago | parent | next [-]

If they've already built themselves a loyal customer base (which is usually the point of fighting a price war) and the customers are happy with the technology they have, then if funding is tight and turning a profit is more important why wouldn't they pivot to optimizing inference by stopping further training, freezing the model versions, burning the weights into silicon and building better caching strategies and improving harnesses and tools that lower their cost and increase their margin?

If all they do is hike prices then they'll lose customers to competitors who don't or who find a way to serve a similar model cheaper.

The demand isn't going to go away purely through higher prices. Once people know something is possible they will demand it whether supply is constrained or not. That's a huge bounty for anyone who can figure out how to service that demand.

▲

philistine a day ago | parent [-]

Easier said than done. What you're describing can take years to implement. Can OpenAI et al. keep burning cash at the same rate for two years while they wait for the salvation of custom silicon if the investments dry up?

	▲	eru 13 hours ago \| parent [-]
		They could stop further training right this very second.

▲

HDThoreaun a day ago | parent | prev [-]

There is no evidence that labs are losing money on inference subscriptions. The labs have massive fixed costs, but as long as inference spend is higher than the datacenters they use for inference cost all they need to do to become profitable is scale up. Right now software engineers are basically the only ones actually paying for inference, the labs just need to create coding assistants for everything that are good enough that every white collar worker in the country(world?) is paying a $1000/yr subscription. Certainly theres a lot of risk, will models become commoditized and everyone switches to open models? can they actually get non software engineers to pay for inference in mass? But its not like theres no path

▲

fatata123 a day ago | parent | prev [-]

[dead]

▲

zozbot234 a day ago | parent | prev | next [-]

> The fact that public LLM usage is leveling off at a price of $0

Tne price is very much not $0, even 'free' models have usage capacity limits that equate to a shadow-price.

▲

adventured a day ago | parent | prev | next [-]

LLMs haven't remotely begun to be integrated into the lives of the typical person. Not even close. The typical person is using LLMs not at all as it pertains to their daily life tasks. They're using them almost entirely for limited discussion matters (eg having a discussion with GPT about a medical issue, or a work related matter).

This is the first or second inning in the LLM rollout. It'll take 15-20 more years for full integration of AI agents into the life of the typical person.

The claw experiments for example can just barely be considered alpha stage. They're early AI garbage unfit for the average person to utilize safely. That new world hasn't gotten near the typical person yet.

The compute requirements to get to full integration of AI agents into the life of the average person - billions of them - is far beyond 10x where we're at now.

▲

pizlonator a day ago | parent | next [-]

> LLMs haven't remotely begun to be integrated into the lives of the typical person. Not even close. The typical person is using LLMs not at all as it pertains to their daily life tasks. They're using them almost entirely for limited discussion matters

This is an argument in favor of demand having leveled off.

	▲	pigpop a day ago \| parent [-]
		Only if nothing changes. Right now, people are running agent frameworks like OpenClaw on their own hardware or a VPS and the frameworks are often single person projects. This results in all sorts of problems but you can pick an easy solution from history which is to create a walled garden service for running these agents where you can provide security and standardization. If that platform also allows trusted services to integrate then they can provide end to end security guarantees. They also benefit from improvements to the models themselves making them more difficult to subvert. Creating something that is secure enough for the average person to entrust their credit card to is not an impossible task.

▲

pydry a day ago | parent | prev [-]

>The typical person is using LLMs not at all as it pertains to their daily life tasks.

This doesnt track at all with my experience. Everybody is using it everywhere.

Moreover people are using them for daily life tasks even when it is not an appropriate use of LLMs - e.g. getting medical advice as you referred to or writing emails which are clearly pissing off their coworkers.

In this respect I see it as akin to radium - a new technology that got a little too fashionable for its own good when it first emerged and which will likely have many use cases scaled back.

▲

TheScaryOne a day ago | parent | next [-]

>Everybody is using it everywhere.

No one in our Auto shop is using AI. One of the new diagnostic tools was demo'd with AI, and none of us were having it. It's about as accurate as Googling your symptoms.

My mother had an AI powered lung scan that came back with Stage 4 Cancer. The Oncologist got called in (for a fee!) to tell us it was just early stage COPD.

▲

user34283 a day ago | parent | prev | next [-]

In my experience people vastly overestimate the competence of doctors. Getting medical advice from LLMs could be life saving.

Personally I experienced this when a specialized doctor believed a drug interaction to be the opposite, thinking A hinders the absorption of B, when actually it hinders the clearance, tripling concentration of B.

Without AI, I would have been clueless about this and could not have spotted the mistake. I don't know if it would truly have been critical, but it did shake my confidence in doctors.

▲

PAndreew 14 hours ago | parent [-]

This^^ Use both, they have their own strengths and weaknesses.

	▲	eru 13 hours ago \| parent [-]
		And the AIs are still getting better at a good clip. I'm not so sure about (unassisted) doctors.

▲

HDThoreaun a day ago | parent | prev [-]

> getting medical advice

Id be careful stating this is an inappropriate use of LLMs. Im semi tapped in to the medical literature community and there is a lot of serious discussion and research going into the usage of LLMs for medical advice and most of it is showing that LLMs are barely worse than doctors, and much much cheaper/more convenient. They definitely arent ready to completely replace doctors, but it seems they can provide competent medical advice in a pinch. Look out for the literature on this in the coming year, its only the last few months that researchers seem to be taking LLMs seriously.

▲

Delphiza 2 hours ago | parent | next [-]

I am surprised that people are surprised by this finding, and support your position.

Anecdotally, doctors get things wrong quite frequently. Almost everybody has a bad medical diagnosis/advice story. The amount of reference material that a doctor needs to know off-hand and the data that they are given to make a diagnosis makes it a really difficult job. They also seldom have the ability to know whether their diagnosis/treatment worked, so have a limited ability to 'learn' from outcomes. (I did some work for cancer research and one of the most difficult problems was trying to get 'end of treatment' data because the end of treatment was often an unknown, to the researchers, death).

The ability to have a 'prompt' that includes lab data is likely to be better than the opinions of a doctor that only has one person's professional experience, limited ability to interpret 'prompts', and needing map it to an in-memory conditions database.

▲

checkyoursudo a day ago | parent | prev | next [-]

This seems ripe for a joke akin to "how was the food?" "bad, but at least the portions were big!"

Like, "how was the medical advice" "worse than a doc's, but at least it was cheaper!"

▲

HDThoreaun a day ago | parent [-]

Well the thing is that it often isnt worse than a doctor's, thats the point of the research here. I get that sounds crazy, just watch out for the coming literature I guess.

A significant portion of americans detest the medical industry and deeply dislike going to the doctor so I dont even think the product needs to be very good to disrupt the way the system works, just different and accessible is likely enough. Funnily enough, restaurants where the food is bad but the portions are big are actually decently popular. Priorities can vary so widely that many people are unable to even comprehend the priorities a significant number of people truly hold.

	▲	d2ssa 20 hours ago \| parent [-]
		"deeply dislike going to the doctor" No you are not capturing the trade off at all. And frankly you clearly have an inherent agenda implicit in your posts, that's clear to see.

▲

jrflowers a day ago | parent | prev [-]

> barely worse than doctors

I like that this comment is below, and posted after, an example where somebody had to pay extra money to clear up a misdiagnosis of stage 4 cancer by the “barely worse” software

▲

HDThoreaun a day ago | parent [-]

There are many examples of doctors misdiagnosing a wide variety of things, which is largely the point here. People think of doctors as infallible when that is not even close to true.

Im certainly not saying fire all the radiologists, just advising an open mind when the actual literature starts saying that LLMs are as good as doctors in some areas.

▲

pydry 21 hours ago | parent [-]

There are many examples of people into homeopathy, chinese medicine and even witchcraft using an identical (not similar, identical) argument to the one you just used to push it.

	▲	d2ssa 20 hours ago \| parent \| next [-]
		Legit that dude seems like a nutter. lol'd hard at "Im semi tapped in to the medical literature community."
	▲	jrflowers 15 hours ago \| parent \| prev [-]
		Yeah that’s the pitch for Dianetics

▲

vonneumannstan a day ago | parent | prev | next [-]

Pretty sure the entire markets for Storage, HBM, DDR5, etc are completely sold out for next several years. How is that saturated?

▲

Analemma_ a day ago | parent | prev | next [-]

We’re not even close to demand saturation with tokens. Have you seen the people rending their garments with rage that Anthropic and Google won’t let them use their flat-rate subscriptions to burn millions of tokens per hour on OpenClaw? And that’s a tiny set of die-hard tinkerers.

The ceiling of token use when everyone has something akin to OpenClaw just running as a background process on their phone is way higher than there’s supply for right now. Jevons paradox is still in full force.

▲

Macha a day ago | parent [-]

Is that not appealing to those users _because_ its a subsidised flat rate? Like those users could go and swap to API pricing right now if they wanted to, but at API pricing they don’t want to

	▲	Analemma_ 4 hours ago \| parent [-]
		Right, but that just proves there's tons of pent-up demand waiting in the wings as token prices fall.

▲

kmeisthax a day ago | parent | prev [-]

I thought we were going to hit token saturation years ago, but they keep inventing new ways to use tokens. Like, instead of asking a chat model to write something and getting ~1000 tokens out of it, you now have an agent producing ~10,000 tokens - or, worse, spawning 10 subagents that collectively burn ~100,000 tokens. All for marginally better answers with significantly higher compute usage.

Personally, I would have used all those tokens to generate synthetic data for IDA (iterated distillation and amplification) so that the more efficient 1000 token/answer chat model can answer more questions, but apparently that doesn't justify an insane datacenter buildout.

	▲	azinman2 a day ago \| parent \| next [-]
		Everyone is interested in using less tokens to accomplish the same task.
	▲	user34283 a day ago \| parent \| prev [-]
		Marginally better answers? Claude Code and co. can now analyze an enterprise codebase to debug issues in a system with multiple services involved. I don't see how that would have been possible at all in the past.