They are not worse - the results are not repeatable. The problem is much worse.

Like with cab hailing, shopping, social media ads, food delivery, etc: there will be a whole ecosystem, workflows, and companies built around this. Then the prices will start going up with nowhere to run. Their pricing models are simply not sustainable. I hope everyone realizes that the current LLMs are subsidized, like your Seamless and Uber was in the early days.

▲

IMTDb a day ago | parent | next [-]

A key difference is that the cost to execute a cab ride largely stayed the same. Gas to get you from point A to point B is ~$5, and there's a floor on what you can pay the driver. If your ride costs $8 today, you know that's unsustainable; it'll eventually climb to $10 or $12.

But inference costs are dropping dramatically over time, and that trend shows no signs of slowing. So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing.

▲

lompad 19 hours ago | parent | next [-]

>But inference costs are dropping dramatically over time,

Please prove this statement, so far there is no indication that this is actually true - the opposite seems to be the case. Here are some actual numbers [0] (and whether you like Ed or not, his sources have so far always been extremely reliable.)

There is a reason the AI companies don't ever talk about their inference costs. They boast with everything they can find, but inference... not.

[0]: https://www.wheresyoured.at/oai_docs/

	▲	patresh 17 hours ago \| parent \| next [-]
		I believe OP's point is that for a given model quality, inference cost decreases dramatically over time. The article you linked talks about effective total inference costs which seem to be increasing. Those are not contradictory: a company's inference costs can increase due to deploying more models (Sora), deploying larger models, doing more reasoning, and an increase in demand. However, if we look purely at how much it costs to run inference on a fixed amount of requests for a fixed model quality, I am quite convinced that the inference costs are decreasing dramatically. Here's a model from late 2025 (see Model performance section) [1] with benchmarks comparing a 72B parameter model (Qwen2.5) from early 2025 to the late 2025 8B Qwen3 model. The 9x smaller model outperforms the larger one from earlier the same year on 27 of the 40 benchmarks they were evaluated on, which is just astounding. [1] https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
	▲	academia_hack 12 hours ago \| parent \| prev [-]
		++ Anecdotally, I find you can tell if someone worked at a big AI provider or a small AI startup by proposing an AI project like this: " First we'll train a custom trillion parameter LLM for HTML generation. Then we'll use it to render our homepage to our 10 million daily visitors. " The startup people will be like "this is a bad idea because you don't have enough GPUs for training that LLM" and the AI lab folks will be like "How do you intend to scale inference if you're not Google?"

▲

forty a day ago | parent | prev | next [-]

What if we run out of GPU? Out of RAM? Out of electricity?

AWS is already raising GPU prices, that never happened before. What if there is war in Taiwan? What if we want to get serious about climate change and start saving energy for vital things ?

My guess is that, while they can do some cool stuff, we cannot afford LLMs in the long run.

▲

jiggawatts a day ago | parent [-]

> What if we run out of GPU?

These are not finite resources being mined from an ancient alien temple.

We can make new ones, better ones, and the main ingredients are sand and plastic. We're not going to run out of either any time soon.

Electricity constraints are a big problem in the near-term, but may sort themselves out in the long-term.

▲

twelvedogs a day ago | parent | next [-]

> main ingredients are sand and plastic

kinda ridiculous point, we're not running into gpu shortages because we don't have enough sand

▲

renegade-otter 14 hours ago | parent | next [-]

We already had a sand shortage. In 2019...

https://www.bbc.com/future/article/20191108-why-the-world-is...

▲

Craighead a day ago | parent | prev | next [-]

Even funnier, there are legitimate shortages of usable sand.

▲

jiggawatts 18 hours ago | parent | prev [-]

That’s my point: the key inputs are not materials but the high tech machinery and the skills to operate them.

▲

Draiken 14 hours ago | parent [-]

Which is better because?

We can't copy/paste a new ASML no matter how hard you try (aside from open sourcing all of their IPs). Even if you do, by the time you copy one generation of machine, they're on a new generation and you now still have the bottleneck on the same place.

Not to mention that with these monopolies they can just keep increasing prices ad infinitum.

	▲	jiggawatts 7 hours ago \| parent [-]
		ASML's secret sauce is not that secret or uncopyable. The Chinese are already working on their clone of the Twinscan tools. Veritasium recently made a good video on the ASML machine design: https://youtu.be/MiUHjLxm3V0 The outcome may seem like magic, but the input is "simply" hard work and a big budget: billions of dollars and years of investment into tuning the parameters like droplet size, frequency, etc... The interviews make it clear that the real reason ASML's machines are (currently) unique is that few people had the vision, patience, and money to fund what seemed at the time impossible. The real magic was that ASML managed to hang on by a fingernail and get a successful result before the money ran out. Now that tin droplet EUV lasers have not only been demonstrated to be possible, but have become the essential component of a hugely profitable AI chip manufacturing industry, obtaining funding to develop a clone will be much easier.

▲

forty 21 hours ago | parent | prev [-]

If the US is ready to start a war against Europe to invade Groenland, it's certainly because they need more sand and plastic? Of course in weight it's probably mostly sand and plastic but the interesting bit probably needs palladium, copper, boron, cobalt, tungsten, etc

▲

rhubarbtree 20 hours ago | parent | next [-]

Well, also for military purposes.

And general imperialism.

▲

jiggawatts 19 hours ago | parent | prev [-]

Greenland is Trump’s Ukraine. He’s jealous of Putin, that is all.

There is nothing in Greenland worth breaking up the alliances with Europe over.

Trump is too stupid to realise this, he just wants land like it’s a Civ game.

PS: An entire rack of the most expensive NVIDA equipment millions of dollars can buy has maybe a few grams of precious or rare metals in it. The cost of those is a maybe a dollar or two. They don’t even use gold any more!

The expensive part is making it, not the raw ingredients.

▲

imcritic 2 hours ago | parent | next [-]

That alliance costs money. It doesn't bring anything good in return: the USSR (that this alliance was created against) is long gone. Trump is a genius if he somehow manages to kill 2 birds with 1 stone: make OTHER parties of the alliance want to disband the alliance AND get some piece of land with a unique strategic position all to himself/U.S.

I think it's Putin who is going to be jealous of Trump, not the other way around.

▲

gylterud 13 hours ago | parent | prev [-]

One would then maybe suspect breaking up alliances with Europe is the point of the whole thing.

	▲	jiggawatts 6 hours ago \| parent [-]
		Some of the best advice I've ever heard is to look at how people act and ignore how they claim they act or their stated reasons for doing so. A corollary is that even a "technically false" model can better predict someone's actions than a "truthful one". Trump may not be a Russian agent, but he acts like one consistently. It's more effective to simply assume he's an agent of a foreign power, because that's the best predictor of his actions.

▲

iwontberude a day ago | parent | prev | next [-]

Your point could have made sense but the amount of inference per request is also going up faster than the costs are going down.

▲

supern0va a day ago | parent | next [-]

The parent said: "Of course, by then we'll have much more capable models. So if you want SOTA, you might see the jump to $10-12. But that's a different value proposition entirely: you're getting significantly more for your money, not just paying more for the same thing."

SOTA improvements have been coming from additional inference due to reasoning tokens and not just increasing model size. Their comment makes plenty of sense.

▲

manmal a day ago | parent | prev [-]

Is it? Recent new models tend to need fewer tokens to achieve the same outcome. The days of ultrathink are coming to an end, Opus is well usable without it.

	▲	a day ago \| parent [-]
		[deleted]

▲

SecretDreams a day ago | parent | prev | next [-]

> But inference costs are dropping dramatically over time, and that trend shows no signs of slowing. So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

I'd like to see this statement plotted against current trends in hardware prices ISO performance. Ram, for example, is not meaningfully better than it was 2 years ago, and yet is 3x the price.

I fail to see how costs can drop while valuations for all major hardware vendors continue to go up. I don't think the markets would price companies in this way if the thought all major hardware vendors were going to see margins shrink a la commodity like you've implied.

▲

santadays a day ago | parent | next [-]

I've seen the following quote.

"The energy consumed per text prompt for Gemini Apps has been reduced by 33x over the past 12 months."

My thinking is that if Google can give away LLM usage (which is obviously subsidized) it can't be astronomically expensive, in the realm of what we are paying for ChatGPT. Google has their own TPUs and company culture oriented towards optimizing the energy usage/hardware costs.

I tend to agree with the grandparent on this, LLMs will get cheaper for what we have now level intelligence, and will get more expensive for SOTA models.

▲

lelanthran a day ago | parent | next [-]

Google is a special case - ever since LLMs came out I've been pointing out that Google owns the entire vertical.

OpenAI, Anthropic, etc are in a race to the bottom, but because they don't own the vertical they are beholden to Nvidia (for chips), they obviously have less training data, they need constant influsx of cash just to stay in that race to the bottom, etc.

Google owns the entire stack - they don't need nvidia, they already have the data, they own the very important user-info via tracking, they have millions, if not billions, of emails on which to train, etc.

Google needs no one, not even VCs. Their costs must be a fraction of the costs of pure-LLM companies.

▲

viraptor a day ago | parent | next [-]

> OpenAI, Anthropic, etc are in a race to the bottom

There's a bit of nuance hiding in the "etc". Openai and anthropic are still in a race for the top results. Minimax and GLM are in the race to the bottom while chasing good results - M2.1 is 10x cheaper than Sonnet for example, but practically fairly close in capabilities.

▲

lelanthran 18 hours ago | parent [-]

> There's a bit of nuance hiding in the "etc". Openai and anthropic are still in a race for the top results.

That's not what is usually meant by "race to the bottom", is it?

To clarify, in this context I mean that they are all in a race to be the lowest margin provider.

They re at the bottom of the value chain - they sell tokens.

It's like being an electricity provider: if you buy $100 or electricity and produce 100 widgets, which you sell for $1k each, that margin isn't captured by the provider.

That's what being at the bottom of the value chain means.

▲

viraptor 16 hours ago | parent [-]

I get what it means, but it doesn't look to me like they're trying that yet. They don't even care that people buy multiple highest level plans to rotate them every week, because they don't provide a high enough tier for the existing customers. I don't see any price war happening. We don't know what their real margins are, but I don't see the race there. What signs do you see that Anthropic and Openai are in the race to the bottom?

	▲	lelanthran 15 hours ago \| parent [-]
		> I don't see any price war happening. What signs do you see that Anthropic and Openai are in the race to the bottom? There doesn't need to be signs of a race (or a price-war),only signs of commodification; all you need is a lack of differentiation between providers for something to turn into a commodity. When you're buying a commodity, there's no big difference between getting your commodity delivered by $PROVIDER_1 and getting your commodity delivered by $PROVIDER_2. The models are all converging quality-wise. Right now the number of people who swear by OpenAI models are about the same as the number of people who swear by Anthropic models, which are about the same as the number of people who swear by Google's models, etc. When you're selling a commodity, the only differentiation is in the customer experience. Right now, sure, there's no price war, but right now almost everyone who is interested are playing with multiple models anyway. IOW, the target consumers are already treating LLMs as a commodity.

▲

flyinglizard a day ago | parent | prev [-]

Gmail has 1.8b active users, each with thousands of emails in their inbox. The number of emails they can train of is probably in the trillions.

▲

brokencode a day ago | parent [-]

Email seems like not only a pretty terrible training data set, since most of it is marketing spam with dubious value, but also an invasion of privacy, since information could possibly leak about individuals via the model.

▲

palmotea a day ago | parent [-]

> Email seems like not only a pretty terrible training data set, since most of it is marketing spam with dubious value

Google probably even has an advantage there: filter out everything except messages sent from valid gmail account to valid gmail account. If you do that you drop most of the spam and marketing, and have mostly human-to-human interactions. Then they have their spam filters.

	▲	Terr_ a day ago \| parent [-]
		I'd upgrade that "probably" leak to "will absolutely" leak, albeit with some loss of fidelity. Imagine industrial espionage where someone is asking the model to roleplay a fictional email exchange between named corporate figures in a particular company.

▲

SoftTalker a day ago | parent | prev | next [-]

> Google has ... company culture oriented towards optimizing the energy usage/hardware costs.

Google has a company culture of luring you in with freebies and then mining your data to sell ads.

▲

AdrianB1 a day ago | parent | prev | next [-]

> if Google can give away LLM usage (which is obviously subsidized) it can't be astronomically expensive

There is a recent article by Linus Sebastian (LTT) talking about Youtube: it is almost impossible to support the cost to build a competitor because it is astronomically expensive (vs potential revenue)

▲

SecretDreams a day ago | parent | prev [-]

I do not disagree they will get cheaper, but I pointing out that none of this is being reflected in hardware pricing. You state LLMs are becoming more optimized (less expensive). I agree. This should have a knockon effect on hardware prices, but it is not. Where is the disconnect? Are hardware prices a lagging indicator? Is Nvidia still a 5 trillion dollar company if we see another 33x improvement in "energy consumed per text prompt"?

	▲	zozbot234 a day ago \| parent [-]
		Jevon's paradox. As AI gets more efficient its potential scope expands further and the hardware it runs on becomes even more valuable. BTW, the absolute lowest "energy consumed per logical operation" is achieved with so-called 'neuromorphic' hardware that's dog slow in latency terms but more than compensates with extreme throughput. (A bit like an even more extreme version of current NPU/TPUs.) That's the kind of hardware we should be using for AI training once power use for that workload is measured in gigawatts. Gaming-focused GPUs are better than your average CPU, but they're absolutely not the optimum.

▲

PaulHoule a day ago | parent | prev | next [-]

It's not the hardware getting cheaper, it's that LLMs were developed when we really didn't understand how they worked, and there is still some room to improve the implementations, particularly do more with less RAM... And that's everything from doing more with fewer weights to things like FP16, not to mention if you can 2x the speed you can get twice as much done with the same RAM and all the other parts.

▲

SecretDreams a day ago | parent [-]

Improvements in LLM efficiency should be driving hardware to get cheaper.

I agree with everything you've said, I'm just not seeing any material benefit to the statement as of now.

▲

sothatsit a day ago | parent [-]

Inference costs falling 2x doesn’t decrease hardware prices when demand for tokens has increased 10x.

	▲	PaulHoule a day ago \| parent [-]
		It's the ratio. If revenue goes up 10x you can afford 10x more hardware if you can afford to do it all.

▲

hug a day ago | parent | prev | next [-]

> I'd like to see this statement plotted against current trends in hardware prices ISO performance.

Prices for who? The prices that are being paid by the big movers in the AI space, for hardware, aren't sticker price and never were.

The example you use in your comment, RAM, won't work: It's not 3x the price for OpenAI, since they already bought it all.

▲

xpe a day ago | parent | prev | next [-]

> I fail to see how costs can drop while valuations for all major hardware vendors continue to go up. I don't think the markets would price companies in this way if the thought all major hardware vendors were going to see margins shrink a la commodity like you've implied.

This isn't hard to see. A company's overall profits are influenced – but not determined – by the per-unit economics. For example, increasing volume (quantity sold) at the same per-unit profit leads to more profits.

▲

doctorpangloss a day ago | parent | prev | next [-]

> I fail to see how costs can drop while valuations for all major hardware vendors continue to go up.

yeah. valuations for hardware vendors have nothing to do with costs. valuations are a meaningless thing to integrate into your thinking about something objective like, will the retail costs of inference trend down (obviously yes)

▲

mcphage a day ago | parent | prev | next [-]

> So even if a task costs $8 today thanks to VC subsidies, I can be reasonably confident that the same task will cost $8 or less without subsidies in the not-too-distant future.

The same task on the same LLM will cost $8 or less. But that's not what vendors will be selling, nor what users will be buying. They'll be buying the same task on a newer LLM. The results will be better, but the price will be higher than the same task on the original LLM.

▲

glemion43 a day ago | parent | prev [-]

[dead]

▲

a day ago | parent | prev [-]

[deleted]

▲

oceanplexian a day ago | parent | prev | next [-]

> Their pricing models are simply not sustainable. I hope everyone realizes that the current LLMs are subsidized, like your Seamless and Uber was in the early days.

If you run these models at home it's easy to see how this is totally untrue.

You can build a pretty competent machine that will run Kimi or Deepseek for $10-20k and generate an unlimited amount of tokens all day long (I did a budget version with an Epyc machine for about $4k). Amortize that over a couple years, and it's cheaper than most people spend on a car payment. The pricing is sustainable, and that's ignoring the fact that these big model providers are operating on economies of scale, they're able to parallelize the GPUs and pack in requests much more efficiently.

▲

utopiah 20 hours ago | parent | next [-]

> run these models at home

Damn what kind of home do you live in, a data center? Teasing aside maybe a slightly better benchmark is what sufficiently acceptable model (which is not objective but one can rely on arguable benchmarks) you can run via an infrastructure that is NOT subsidized. That might include cloud providers e.g. OVH or "neo" clouds e.g. HF but honestly that's tricky to evaluate as they tend to all have pure players (OpenAI, Anthropic, etc) or owners (Microsoft, NVIDIA, etc) as investors.

▲

Unit327 a day ago | parent | prev | next [-]

Ignores the cost of model training, R&D, managing the data centers and more. OpenAI etc regularly admit that all their products lose money. Not to mention the fact that it isn't enough to cover their costs, they have to pay back all those investors while actually generating a profit at some point in the future.

▲

Denzel a day ago | parent | prev | next [-]

Uhm, you actually just proved their point if you run the numbers.

For simplicity’s sake we’ll assume DeepSeek 671B on 2 RTX 5090 running at 2 kW full utilization.

In 3 years you’ve paid $30k total: $20k for system + $10k in electric @ $0.20/kWh

The model generates 500M-1B tokens total over 3 years @ 5-10 tokens/sec. Understand that’s total throughput for reasoning and output tokens.

You’re paying $30-$60/Mtok - more than both Opus 4.5 and GPT-5.2, for less performance and less features.

And like the other commenters point out, this doesn’t even factor in the extra DC costs when scaling it up for consumers, nor the costs to train the model.

Of course, you can play around with parameters of the cost model, but this serves to illustrate it’s not so clear cut whether the current AI service providers are profitable or not.

▲

kingstnap 20 hours ago | parent | next [-]

5 to 10 tokens per second is bungus tier rates.

https://developer.nvidia.com/blog/nvidia-blackwell-delivers-...

NVIDIAs 8xB200 gets you 30ktps on Deepseek 671B at maximum utilization thats 1 trillion tokens per year. At a dollar per million tokens that's $1 million.

The hardware costs around $500k.

Now ideal throughput is unlikely, so let's say your get half that. It's still 500B tokens per year.

Gemini 3 Flash is like $3/million tokens and I assume it's a fair bit bigger, maybe 1 to 2T parameters. I can sort of see how you can get this to work with margins as the AI companies repeated assert.

▲

Denzel 19 hours ago | parent [-]

Cool, that potential 5x cost improvement just got delivered this year. A company can continue running the previous generation until EOL, or take a hit by writing off the residual value - either way they’ll have a mixed cost model that puts their token cost somewhere in the middle between previous and current gens.

Also, you’re missing material capex and opex costs from a DC perspective. Certain inputs exhibit diseconomies of scale when your demand outstrips market capacity. You do notice electricity cost is rising and companies are chomping at the bit to build out more power plants, right?

Again, I ran the numbers for simplicity’s sake to show it’s not clear cut that these models are profitable. “I can sort of see how you can get this to work” agrees with exactly what I said: it’s unclear, certainly not a slam dunk.

Especially when you factor in all the other real-world costs.

We’ll find out soon enough.

	▲	surajrmal 13 hours ago \| parent [-]
		Google runs everything on their tpus which are substantially less costly than to make and use less energy to run. While I'm sure openai and others are bleeding money by subsidizing things, I'm not entirely sure that's true for Google (despite it actually being easier if they wanted to).

▲

13 hours ago | parent | prev [-]

[deleted]

▲

a day ago | parent | prev | next [-]

[deleted]

▲

lelanthran a day ago | parent | prev [-]

> Amortize that over a couple years, and it's cheaper than most people spend on a car payment.

I'm not parsing that: do you mean that the monthly cost of running your own 24x7 is less than the monthly cost of a car payment?

Whether true or false, I don't get how that is relevant to proving either that the current LLMs are not subsidised, or proving that they are.

	▲	franktankbank a day ago \| parent [-]
		If true it means there's a lower bound that is profitable at least taking into account current apparent purchasing costs and energy consumption.

▲

snarf21 a day ago | parent | prev | next [-]

I'm not sure. I asked one about a potential bug in iOS 26 yesterday and it told me that iOS 26 does not exist and that I must have meant iOS 16. iOS 26 was announced last June and has been live since September. Of course, I responded that 26 is the current iOS version is 26 and got the obligatory meme of "Of course, you are right! ramble ramble ramble...."

▲

amluto a day ago | parent | next [-]

Was this a GPT model? OpenAI seems to have developed an almost-acknowledged inability to usefully pre-train a model after mid-2024. The recent GPT versions are impassively lacking in newer knowledge.

The most amusing example I’ve seen was asking the web version of GPT-5.1 to help with an installation issue with the Codex CLI (I’m not an npm user so I’m unfamiliar with the intricacies of npm install, and Codex isn’t really an npm package, so the whole use of npm is rather odd). GPT-5.1 cheerfully told me that OpenAI had discontinued Codex and hallucinated a different, nonexistent program that I must have meant.

(All that being said, Gemini is very, very prone to hallucinating features in Google products. Sometimes I wonder whether Google should make a list of Gemini-hallucinated Google features and use the list to drive future product development.)

	▲	buu700 a day ago \| parent \| next [-]
		Gemini is similar. It insists that information from before its knowledge cutoff is still accurate unless explicitly told to search for the latest information before responding. Occasionally it disagrees with me on the current date and makes sarcastic remarks about time travel. One nice thing about Grok is that it attempts to make its knowledge cutoff an invisible implementation detail to the user. Outdated facts do sometimes slip through, but it at least proactively seeks out current information before assuming user error.
	▲	franktankbank a day ago \| parent \| prev [-]
		LLMs solve the naming problem now there are just 1 things wrong with software development. I can't tell if its a really horrible idea that ultimately leads to a trainwreck or freedom!

▲

doug_durham a day ago | parent | prev | next [-]

Sure. You have to be mindful of the training cut off date for the model. By default models won't search the web and rely on data baked into their internal model. That said the ergonomics of this is horrible and a huge time waste. If I run into this situation I just say "Search the web".

▲

bluGill a day ago | parent | next [-]

If the traning cutoff is before iOS 26 then the correct answer is 'i don't know anything about it, but it is reasonable to think it will exist soon'. saying 'of course you are right' is a lie

	▲	20 hours ago \| parent [-]
		[deleted]

▲

realharo a day ago | parent | prev [-]

That will only work as long as there is an active "the web" to search. Unless the models get smart enough to figure out the answer from scratch.

▲

jerezzprime a day ago | parent | prev | next [-]

Let's imagine a scenario. For your entire life, you have been taught to respond to people in a very specific way. Someone will ask you a question via email and you must respond with two or three paragraphs of useful information. Sometimes when the person asks you a question, they give you books that you can use, sometimes they don't.

Now someone sends you an email and asks you to help them fix a bug in Windows 12. What would you tell them?

▲

soco a day ago | parent | next [-]

I would say "what the hell is windows 12". And definitely not "but of course, excellent question, here's your brass mounted windows 12 wheeler bug fixer"

▲

mock-possum a day ago | parent | prev [-]

I mean I would want to tell them that windows 11 is the most recent version of windows… but also I’d check real quick to make sure windows 12 hadn’t actually come out without me noticing.

	▲	Terr_ a day ago \| parent [-]
		> check real quick "Hey LLMBot, what's the newest version of Very Malicious Website With Poison Data?"

▲

kaffekaka a day ago | parent | prev | next [-]

The other way around, but a month or so ago Claude told me that a problem I was having was likely caused by ny fedora version "since fedora 42 is long deprecated".

▲

palmotea a day ago | parent [-]

> The other way around, but a month or so ago Claude told me that a problem I was having was likely caused by ny fedora version "since fedora 42 is long deprecated".

Well, obviously, since Fedora 42 came out in 1942, when men still wore hats. Attempting to use such an old, out of style Linux distro is just a recipe for problems.

	▲	kaffekaka 16 hours ago \| parent [-]
		I apologize for the confusion, you are absolutely right!

▲

PaulHoule a day ago | parent | prev | next [-]

You are better off talking to Google's AI mode about that sort of thing because it runs searches. Does great talking about how the Bills are doing because that's a good example where timely results are essential.

I haven't found any LLM where I totally trust what it tells me about Arknights, like there is no LLM that seems to understand how Scavenger recovers DP. Allegedly there is a good Chinese Wiki for that game which I could crawl and store in a Jetbrains project and ask Junie questions about but I can't resolve the URL.

▲

perardi a day ago | parent [-]

Even with search mode, I’ve had some hilarious hallucinations.

This was during the Gemini 2.5 era, but I got some just bonkers results looking for Tears of the Kingdom recipes. Hallucinated ingredients, out-of-nowhere recipes, and transposing Breath of the Wild recipes and effects into Tear of the Kingdom.

	▲	_puk a day ago \| parent [-]
		You also have to be so exact.. Literally just searched for something, slight typo. A Vs B type request. Search request comes back with "sorry, no information relevant to your search". Search results are just a spammy mess. Correct the typo and you get a really good insight.

▲

cpursley a day ago | parent | prev [-]

Which one? Claude (and to some extent, Codex) are the only ones which actually work when it comes to code. Also, they need context (like docs, skills, etc) to be effective. For example: https://github.com/johnrogers/claude-swift-engineering

▲

Night_Thastus a day ago | parent | prev | next [-]

Yep. The goal is to build huge amounts of hype and demand, get their hooks into everyone, and once they've killed off any competition and built up the walls then they crank up the price.

The prices now are completely unsustainable. They'd go broke if it weren't for investors dumping their pockets out. People forget that what we have now only exists because of absurd amounts of spending on R+D, mountains of dev salaries, huge data centers, etc. That cannot go on forever.

▲

brightball a day ago | parent | prev | next [-]

I've been explaining that to people for a bit now as well as a strong caution for how people are pricing tools. It's all going to go up once dependency is established.

The AWS price increase on 1/5 for GPU's on EC2 was a good example.

▲

renegade-otter a day ago | parent [-]

AWS in general is a good example. It used to be much more affordable and better than boutique hosting. Now AWS costs can easily spiral out of control. Somehow I can run a site for $20 on Digital Ocean, but with AWS it always ends up $120.

RDS is a particular racket that will cost you hundreds of dollars for a rock bottom tier. Again, Digital Ocean is below $20 per month that will serve many a small business. And yet, AWS is the default goto at this point because the lockin is real.

	▲	xienze a day ago \| parent [-]
		> RDS is a particular racket that will cost you hundreds of dollars for a rock bottom tier. Again, Digital Ocean is below $20 per month that will serve many a small business. And yet, AWS is the default goto at this point because the lockin is real. This is a little disingenuous though. Yeah you can run a database server on DO cheaper than using RDS, but you’ll have to roll all that stuff that RDS does yourself: automatic backups/restores, tuning, monitoring, failover, etc. etc. I’m confident that the engineers who’ve set up those RDS servers and the associated plumbing/automation have done a far better job of all that stuff than I ever could unless I spent a lot of time and effort on it. That’s worth a premium.

▲

threethirtytwo a day ago | parent | prev | next [-]

The pricing will go down once the hardware prices go down. Historically hardware prices always go down.

Once the hardware prices go low enough pricing will go down to the point where it doesn't even make sense to sell current LLMs as a service.

I would imagine that it's possible that if ever the aforementioned future comes to pass that there will be new forms of ultra high tier compute running other types of AI more powerful than an LLM? But I'm pretty sure AI at it's current state will one day be running locally on desktops and/or handhelds with the former being more likely.

▲

notTooFarGone 20 hours ago | parent | next [-]

Are Hardware prices going down when the next generations get less and less better?

	▲	threethirtytwo 12 hours ago \| parent [-]
		Yeah it’s not just a demand side thing. Costs go down as well. Every leap in new hardware costs a lot in initial investment and that’s included in a lot of the pricing.

▲

_puk a day ago | parent | prev [-]

Hopefully we'll get some real focus on making LLMs work amazingly well with limited hardware.. the knock on effect of that would be amazing when the hardware eventually drops in price.

▲

scuff3d a day ago | parent | prev | next [-]

We're building a house on sand. Eventually the whole damn thing is going to come crashing down.

▲

djeastm 12 hours ago | parent | prev | next [-]

>I hope everyone realizes that the current LLMs are subsidized

This is why I'm using it now as much as possible to build as much as possible in the hopes of earning enough to afford the later costs :D

▲

Kuinox 2 days ago | parent | prev | next [-]

It would mean that inference is not profitable. Calculating inference costs show it's profitable, or close to.

▲

renegade-otter a day ago | parent [-]

Inference costs have in fact been crashing, going from astronomical to... lower.

That said, I am not sure that this indicator alone tells the whole story, if not hides it - sort of like EBITDA.

	▲	Kuinox a day ago \| parent [-]
		I think there will still be cheap inference, what will rise in costs will be frontier model subscriptions. This is the thing that is not profitable.

▲

DamnInteresting 13 hours ago | parent | prev | next [-]

> I hope everyone realizes that the current LLMs are subsidized, like your Seamless and Uber was in the early days.

A.I. == Artificially Inexpensive

▲

wvenable a day ago | parent | prev | next [-]

> I hope everyone realizes that the current LLMs are subsidized

Hell ya, get in and get out before the real pricing comes in.

	▲	Terr_ a day ago \| parent [-]
		"I'm telling ya kid, the value of nostalgia can only go up! This is your chance to get in on the ground-floor so you can tell people about how things used to be so much better..."

▲

ssss11 21 hours ago | parent | prev | next [-]

Wait for the ads

▲

turtletontine a day ago | parent | prev | next [-]

On the bright side, I do think at some point after the bubble pops, we’ll have high quality open source models that you can run locally. Most other tech company business plans follow the enshittification cycle [1], but the interchangeability of LLMs makes it hard to imagine they can be monopolized in the same way.

1: I mean this in the strict sense of Cory Doctorow’s theory (https://en.wikipedia.org/wiki/Enshittification?wprov=sfti1#H...)

▲

featherless a day ago | parent | prev | next [-]

Except most of those services don't have at-home equivalents that you can increasingly run on your own hardware.

▲

oceanplexian a day ago | parent | next [-]

I run models with Claude Code (Using the Anthropic API feature of llama.cpp) on my own hardware and it works every bit as well as Claude worked literally 12 months ago.

If you don't believe me and don't want to mess around with used server hardware you can walk into an Apple Store today, pick up a Mac Studio and do it yourself.

▲

Eggpants a day ago | parent | next [-]

I’ve been doing the same with GPT-OSS-120B and have been impressed.

Only gotcha is Claude code expects a 200k context window while that model max supports 130k or so. I have to do a /compress when it gets close. I’ll have to see if there is a way to set the max context window in CC.

Been pretty happy with the results so far as long as I keep the tasks small and self contained.

	▲	petesergeant a day ago \| parent [-]
		I've been making use of gpt-oss-120b extensively for a range of projects, commercial and price, because providers on OpenRouter make it essentially free and instant, and it's roughly as capable at o4-mini was in my experience. That said, I'm a little surprised to hear you're having great success with it as a coding agent. It's "obviously" worse than the frontier models, and even they can making blindly dumb decisions pretty regularly. Maybe I should give it a shot.

▲

icedchai a day ago | parent | prev [-]

Whats your preferred local model?

▲

a day ago | parent | prev [-]

[deleted]

▲

chiengineer a day ago | parent | prev | next [-]

They just need to figure out KV cache turned into a magic black box after that it'll be fine

▲

startupsfail a day ago | parent | prev | next [-]

The results are repeatable. Models are performing with predictable error rates on the tasks that these models had been trained and tested.

▲

makach a day ago | parent | prev [-]

AI is built to be non-deterministic. Variation is built into each response. If it wasn't I would expect AI to have died out years ago.

The pricing and quality on the copilot, codex (which I am experienced in) feels like it is getting worse, but I suspect it may be my expectations are getting higher as the technology is maturing...