I'm not sure how this will play out long term, but I really am not a fan of having to feel like I'm using a limited resource whenever I use an LLM. People like unlimited plans, we are used to them for internet, text messaging, etc. The current pricing models just feel bad.

▲ andruby 5 days ago | parent | next [-]

Unlimited works well for everything that is “too cheap to meter”.

Internet, text messages, etc are roughly that: the direct costs are so cheap.

That’s not the case with LLM’s at this moment. There are significant direct costs to each long-running agent.

▲

rmujica 5 days ago | parent | next [-]

Internet and SMS used to be expensive and metered until they weren't thanks to technological advances and expanded use. I think LLMs will follow the same path, maybe on a shorter timespan.

▲

cmsjustin 5 days ago | parent | next [-]

They were not expensive to operate, they were only expensive for consumers

▲

tialaramex 5 days ago | parent | next [-]

Right, that's crucial to understand. In 1985 you could make a direct dial from England to the US but it was eye wateringly expensive. £2 per minute. An hour's call to your mum? That's over £100.

But the cost to Bell and British Telecom was not £2 per minute, or £1 per minute, or even 1p per minute, it was nothing at all. Their costs were not for the call, but for the infrastructure over which the call was delivered, a transatlantic cable. If there was one call for ten minutes, once a week essentially at random, that cable must still exist, but if there are 10 thousand call minutes per week, a thousand times more, it's the same cable.

So the big telcos all just picked a number and understood it as basically free income. If everybody agrees this call costs £2 then it costs £2 right, and those 10 thousand call minutes generate a Million pound annual income.

It's maybe easier for Americans to understand if you tell them that outside the US the local telephone calls cost money back then. Why were your calls free? Because why not, the decision to charge for the calls is arbitrary, the calls don't actually cost anything, but you will need to charge somehow to recoup the maintenance costs. In the US the long distance calls were more expensive to make up for this for a time, today it's all absorbed in a monthly access fee on most plans.

▲

daveguy 5 days ago | parent | next [-]

This analysis doesn't concern the limited bandwidth available for call delivery on plain old telephone networks (POTS). They did squeeze extra money out of the system with their networks as a monopoly, but the cost was zero only if you don't consider the cost of operating and maintaining the network, or the opportunity cost of having much less bandwidth than currently available. For the former, they still had to fix problems. For the latter if they had made calls pennies everyone would have had "all circuits are busy" all the time. A single line wasn't capable of carrying 10,000 calls back then. Pricing to limit usage to available bandwidth was as important as recouping infrastructure costs and ongoing maintenance. There's also a lemonade stand pricing effect. If you charge too little you don't get enough to cover costs. But if you charge too much, not enough people will do business and you won't cover costs. Also, ma bell was broken up in 1982, but regional monopolies lasted a lot longer (telecommunications act of 1996).

▲

tialaramex 5 days ago | parent [-]

TAT-7 which was in operation in 1985 when I cited the £2 per minute price carried 4000 simultaneous calls, ie up to £8000 per minute

Its successor TAT-8 carried ten times as many calls a few years later, industry professionals opined that there was likely no demand for so many transatlantic calls and so it would never be full. Less than two years later TAT-8 capacity maxed out and TAT-9 was already being planned.

Today lots of people have home Internet service significantly faster than all three of these transatlantic cables put together.

	▲	daveguy 4 days ago \| parent [-]
		Thank you for confirming my statements.

▲

tqwhite 5 days ago | parent | prev [-]

There was some capital expenditure that had to be paid for.

In the US, ATT was just barely deregulated by then so the prices were not just 'out of thin air'.

▲

KaiserPro 5 days ago | parent | prev | next [-]

To lay the cables required a huge amount of capital, to make that feasible its required financial engineering. That translates to high operating expenses.

▲

AlotOfReading 5 days ago | parent [-]

SMS was originally piggybacking off unused bytes in packets already being sent to the tower, which was being paid for by existing phone bills. The only significant expenses involved transiting between networks. That was a separate surcharge in the early days.

▲

KaiserPro 4 days ago | parent | next [-]

> SMS was originally piggybacking off unused bytes in packets already being sent to the tower

for GSM? not really, Yes SMS was sent at a lower priority than voice, so if there is a lot of voice traffic, SMSs wouldn't be delivered as quickly.

But, to my original point, do you think the towers were free?

Thats the point, it took a shittone of captial to make the network.

Did the mobile networks make a lot of cash? also yes. But they also took on huge amounts of debt and went bust.

▲

ssk42 5 days ago | parent | prev [-]

Used to be? What changed?

	▲	daveguy 5 days ago \| parent \| next [-]
		People started sending a lot more texts and making a lot fewer phone calls. And you can only piggyback so many text messages on the call packets.
	▲	KaiserPro 4 days ago \| parent \| prev [-]
		We don't use 2g GSM anymore.

▲

hkt 5 days ago | parent | prev [-]

Competition is the thing. Prices will drop as more AI code assistants get more adoption.

Prices will probably also drop if anyone ever works out how to feasibly compete with NVIDIA. Not an expert here, but I expect they're worried about competition regulators, who will be watching them very closely.

▲

troupo 5 days ago | parent | next [-]

> Prices will drop as more AI code assistants get more adoption.

No, they won't. Because "AI assistants" are mostly wrapped around a very limited number of third-party providers.

And those providers are hemorrhaging money like crazy, and will raise the prices, limit available resources and cut off external access — all at the same time. Some of it is already happening.

▲

lelanthran 4 days ago | parent | prev [-]

> Prices will drop as more AI code assistants get more adoption.

What's the reasoning behind this? They are already doing the efficient "economies of scale" thing and they are already at full capacity (hence rate limiting).

The only way forward for this AI providers is to raise prices, not lower them.

	▲	hkt 2 days ago \| parent [-]
		The more AI assistants there are which are roughly equally competent, the more price becomes a factor. Mobility between providers is quick, it only takes one company willing to burn a lot of cash to win users or strategically hobble a competitor to start a price war. Maybe I'm wrong, but intuitively it feels like this will be the probable endgame.

▲

alwillis 5 days ago | parent | prev | next [-]

Yes and no.

It’s very expensive to create these models and serve them at scale.

Eventually the processing power required to create them will come down, but that’s going to be a while.

Even if there was a breakthrough GPU technology announced tomorrow, it would take several years before it could be put into production.

And pretty much only TSMC can produce cutting edge chips at scale and they have their hands full.

Between Anthropic, xAI and OpenAI, these companies have raised about $84 billion dollars in venture capital… VCs are going to want a return on their investment.

So it’s going to be a while…

▲

margalabargala 5 days ago | parent | prev | next [-]

SMS was designed from the start to fit in the handul of unused bytes in the tower handshake that was happening anyway, hence the 160 char limit. Its marginal cost has always been free on the supply side.

▲

RF_Savage 5 days ago | parent [-]

SMS routing and billing systems did cost money. Especially billing, as the standards had nothing for it, so it was done by 3rd party software for a very long time.

▲

margalabargala 5 days ago | parent [-]

Of course, how pleasingly circular. "It's so expensive because it costs so much to charge you for it".

	▲	baq 5 days ago \| parent [-]
		Exactly! AWS is so expensive because it can be so cheap. Billing was the true innovation.

▲

xtracto 5 days ago | parent | prev | next [-]

I think LLMs follow more of an Energy analogy: Gas or Electricity, or even water.

How much has any if these decreased over the last 5 decades? The problem is that as of right now, LLM cost is linearly (if not exponentially) related to the output. It's basically "transferring energy" converted into bytes. So unless we see some breakthrough in energy generation, or better use it, it will be difficult to scale.

This makes me wonder, would it be possible to pre-compue some kind of "rainbow tables" equivalent for LLMs? Either stored in the client or in the server; so as to reduce the computing needed for inference.

	▲	valenterry 4 days ago \| parent [-]
		I don't think so. Yes, LLMs use electricity. But they use electricity in the data-center, not in your home. That's very different, because it's cheap to transfer tokens from the data-center to your home, but it's not cheap to transfer electricity from the data-center to your home. And that matters, because we can build a data-center in a place where there's lots of renewable and hence cheap energy (e.g. from solar or from water/wind). If you think about it, LLMs are used mostly when people are awake, at least right now. And when is the sun shining? Right. So, build a data-center somewhere where land is cheap and lots of solar panels can be build right next to it. Sure, some other energy source will be used for stability etc., but it won't be as expensive as the energy price for your home. > This makes me wonder, would it be possible to pre-compue some kind of "rainbow tables" equivalent for LLMs? Already happening. Read up on how those companies do caching prompt-prefixes etc.

▲

beefnugs 3 days ago | parent | prev | next [-]

Isn't it the exact opposite? No one is making profit yet, it is a mad dash to monopolize the market, it has to get more expensive to ever turn profit, so the screws will turn

▲

whimsicalism 5 days ago | parent | prev [-]

maybe, but they are not nearly as comparable as you’re making it out to be

▲

MuffinFlavored 4 days ago | parent | prev [-]

> That’s not the case with LLM’s at this moment.

I'd be curious to know how many tokens the average $200/mo user uses and what the cost on their end for it is.

▲ KronisLV 5 days ago | parent | prev | next [-]

I personally take an issue with them expecting that your usage would be more or less consistent throughout the month. Instead, I might have low usage throughout most of the month and then an 11 hour binge a few days, which in most cases would involve running into rate limits (either that, or just token limitations for inputs).

That's why using the API directly and paying for tokens anything past that basic usage feels a bit nicer, since it's my wallet that becomes the limitation then, not some arbitrary limits dreamed up by others. Plus with something like OpenRouter, you can also avoid subscription tier related limits like https://docs.anthropic.com/en/api/rate-limits#rate-limits

Though for now Gemini 2.5 Pro seems to work a bit better than Claude for my code writing/refactoring/explanation/exploration needs. Curious what other cost competitive options are out there.

▲

bugglebeetle 5 days ago | parent | next [-]

Gemini 2.5 Pro is a better coding model, but Gemini CLI is way behind Claude Code, perhaps because the model itself isn’t well-tuned for agentic work. If you’re just doing targeted refactoring and exploration, you can copy and paste back and forth from the web app for $20 a month.

▲

virtualritz 4 days ago | parent | next [-]

Not if you write Rust. From regularly producing code that has unbalanced braces or quote characters somewhere to destroying well working code. It also easily gets into loops where it can't solve a problem, comes up with a half-working solution A, throws it away, comes up with B, then C, then goes back to A etc.

I run Gemini Pro from within CC but I only use it for analysis and planning for which it is better than Claude (Opus).

I guess if your target language is Python or JS/TS etc., your milage may be considerably better.

For Rust it's simply not true.

▲

bugglebeetle 4 days ago | parent [-]

Are you saying Claude is better at writing Rust or Gemini? Or Gemini in Gemini CLI? Not following. In my experience, Gemini 2.5 PRO is better at writing Rust overall.

	▲	karthikkolli 3 days ago \| parent [-]
		My experience is Claude is better at writing rust than Gemini. Gemini cli gets confused easily. But has good higher level picture. In my use case where Gemini is the architect, it even provides full code changes to claude (does not directly compile most of the time, misses move of structs) and claude makes those changes. It works better for me that way.

▲

KronisLV 5 days ago | parent | prev | next [-]

I mostly use RooCode nowadays, which works well enough with both Claude and Gemini and other models, even locally hosted ones. Decoupling the LLM vendor from the tools might miss out on some finer features, but also gives me a little bit more freedom, much like how you can also do with the Copilot plugins and Continue.dev and a few others.

Note: all of them sometimes screw up applying diffs, but in general are good enough.

▲

ewoodrich 5 days ago | parent | prev [-]

Gemini 2.5 Pro made some big post-launch improvements for tool calling/agentic usage that made it go from “pulling teeth” to “almost as smooth as Claude” in Cline/Roo Code (which is saying something since Cline was originally built around Claude tool use specifically).

So the team at least seems to be aware of its shortcomings in that area and working to improve it with some success which I appreciate.

But you are correct that Gemini CLI still lags behind for whatever reason. It gets stuck in endless thought loops way too often for me, like maybe 1/15 tasks hits a thought loop burning API credits or it just never exits from the “Completing task, Verifying completion, Reviewing completion, Assessing completion status…” phase (watching the comical number of ways it rephrases it is pretty funny though).

Meanwhile I’ve only had maybe one loop over a period of a couple months using Gemini 2.5 Pro heavily in Roo Code with the most recent version so it seems like an issue with the CLI specifically.

▲

jjani 4 days ago | parent [-]

Even just a week ago Gemini was still outputting the same message twice almost every time in Cline, I doubt that has changed in the last week.

	▲	ewoodrich 4 days ago \| parent [-]
		Hmmm I haven't used for Cline for several months but in Roo I have made some (fairly minor) changes to my Roo Code code modes and have not experienced this at all. Especially that frequently, I use GPT 4.1 via my Copilot Pro plan and Gemini 2.5 Pro via OpenRouter as my daily drivers. That being said I have had found using Gemini through the AI Studio API much less reliable in Roo/Cline and I'm not sure how you are accessing it. I have auto approve turned on for most routine tasks and run Gemini for relatively long (by Cline/Roo standards) tasks unsupervised without issue, with the normal caveats of any LLM going off the deep end every now and then but haven't seen that more frequently with Gemini except in the first couple months 2.5 Pro was released.

▲

tqwhite 5 days ago | parent | prev | next [-]

This is my strategy as well. I definitely have surges of usage.

Except for one catastrophic binge where I accidentally left Opus on for a whole binge (KILL ME!!!), I use around $150/month. I like having the spigot off when I am not working.

Would the $100/month plan plus API for overflow come out ahead? Certainly on some months. Over the year, I don't know. I'll let you know.

▲

j45 5 days ago | parent | prev [-]

Can anyone help compare a cost comparison between Gemini 2.5 pro vs Claude Code on a plan or API?

▲ Jcampuzano2 5 days ago | parent | prev | next [-]

My opinion is all of these tools should completely get rid of the "pay 20/month, 200/month", etc just to get access to some beholden rate limited amount that becomes hard to track.

Mask off completely and just make it completely usage based for everyone. You could do something for trial users like first 20 (pick your number here) requests are free if you really need to in order to get people on board. Or you could do tiered pricing like first 20 free, next 200 for X rate, next 200 for X*1.25 rate, and then for really high usage users charge the full cost to make up for their extreme patterns. With this they can still subsidize for the people who stay lower on usage rates for market share. Of course you can replace 200 requests with just token usage if that makes sense but I'm sure they can do the math to make it work with request limits if they work hard enough.

Offer better than open-router pricing and that keeps people in your system instead of reaching for 3rd party tools.

If your tool is that good, even with usage based it will get users. The issue is all the providers are both subsidizing users to get market share, but also trying to prohibit bad actors and the most egregious usage patterns. The only way this 100% becomes a non-issue is usage based for everything with no entry fee.

But this also hurts some who pay a subscription but DONT use enough to account for the usage based fees. So some sales people probably don't like that option either. It also makes it easier for people to shop around instead of feeling stuck for a month or two since most people don't want multiple subs at once.

▲

ebiester 4 days ago | parent | next [-]

Then, however, they would be accountable for how many times AI fails.

If I'm paying a flat rate, the only economic cost I am worrying about is "will this be faster than me doing it myself if it fails once or twice?"

If I am paying per token, and it goes off for 20 minutes without solving the problem, I've just spent $$ for no result. Why would I even bother using it?

For something like Claude Code, that's an even more concerning issue - how many background tasks have to fail before I reach my monthly spending limit? How do I get granular control to say "only spend 7 dollars on this task - stop if you cannot succeed." - and I have to write my own accounting system for whether it succeeds or fails.

▲

bananapub 5 days ago | parent | prev | next [-]

> Mask off completely and just make it completely usage based for everyone.

you can already pay per token by giving Claude Code an API key, if you want.

thus, the subtext of every complaint on this thread is that people want "unlimited" and they want their particular use to be under whatever the cap is, and they want it to be cheap.

	▲	Wowfunhappy 4 days ago \| parent [-]
		No, I'm explicitly not saying that! I'm saying that I'd really like the rolling window to be less than a full week, because that's such a long time to wait if I exhaust the limit!

▲

vineyardmike 5 days ago | parent | prev | next [-]

> My opinion is all of these tools should completely get rid of the "pay 20/month, 200/month", etc just to get access.

I think that you should just subscribe to a preset allotment of tokens at a certain price, or a base tier with incremental usage costs for models that aren’t tiny (like paid per minute “long distance calling”).

I use an LLM tool that shows the cost associated with each message/request and most are pennies each. There’s a point where the friction of paying is a disincentive to using it. Imagine you had to pay $0.01 every time you Google searched something? Most people would never use the product because trying to pay $0.30/mo for one day a month of usage is annoying. And no one would want to prepay and fund an account if you weren’t familiar with the product. No consumer likes micro transactions

No one wants to hear this, but the answer is advertising and it will change the game of LLMs. Once you can subsidize the lowest end usage, the incentive for businesses to offer these $20 subscriptions will change, and they’d charge per-usage rates for commercial users.

	▲	troupo 5 days ago \| parent [-]
		> you should just subscribe to a preset allotment of tokens at a certain price The problem is that there's no way to gauge or control token usage. I have no idea why Claude Code wrote that it consumed X tokens now, and Y tokens later, and what to do about it

▲

CodeBrad 5 days ago | parent | prev | next [-]

I think Claude Code also already has the option to provide an API key directly for usage based pricing.

I'm a fan of having both a subscription and a usage based plan available. The subscription is effectively a built in spending limit. If I regularly hit it and need more value, I can switch to an API key for unlimited usage.

The downside is you are potentially paying for something you don't use, but that is the same for all subscription services.

▲

tqwhite 5 days ago | parent | next [-]

I use API but think about getting the $100/mo plan and using API for overflow if it occurs.

But I have slow months and think that might not actually be the winner. Basically I'm going to wait and see before I sign up for auto-pay.

▲

raincole 4 days ago | parent | prev [-]

Giving how expensive Claude Code is if you use API key, I think it's safe to assume the subscription model is bleeding money out.

	▲	Filligree 4 days ago \| parent [-]
		Claude is also really expensive compared to every other model. Maybe that reflects higher underlying costs. Maybe their API prices are just inflated.

▲

jononor 5 days ago | parent | prev [-]

Investors love MRR/ARR, so I do not think we will see that as the main option anytime soon. That said, you can use the Claude API to get usage-based billing.

▲ thorum 5 days ago | parent | prev | next [-]

The long term is unlimited access to local LLMs that are better than 2025’s best cloud models and good enough for 99% of your needs, and limited access to cloud models for when you need to bring more intelligence to bear on a problem.

LLMs will become more efficient, GPUs, memory and storage will continue to become cheaper and more commonplace. We’re just in the awkward early days where things are still being figured out.

▲

pakitan 5 days ago | parent [-]

I'm often using LLMs for stuff that requires recent data. No way I'm running a web crawler in addition to my local LLM. For coding it could theoretically work as you don't always need latest and greatest but would still make me anxious.

	▲	data-ottawa 5 days ago \| parent [-]
		That’s a perfect use case with MCP though. My biggest issue is local models I can run on my m1/m4 mbp are not smart enough to use tools consistently, and the context windows are too small for iterative uses. The last year has seen a lot of improvement in small models though (gemma 3n is fantastic), so hopefully it’s only a matter of time.

▲ qiller 5 days ago | parent | prev | next [-]

I'm ok using a limited resource _if_ I know how much of it I am using. The lack of visible progress towards limits is annoying.

▲

blalezarian 5 days ago | parent | next [-]

Totally agree with this. I live in constant anxiety not knowing how far into my usage I am in all the time.

▲

steveklabnik 5 days ago | parent | prev | next [-]

npx ccusage@latest

I'm assuming it'll get updated to include these windows as well. Pass in "blocks --live" to get a live dashboard!

▲

data-ottawa 5 days ago | parent | next [-]

Oh wow, this showed me the usage stats for the period before ccusage was installed, that’s very helpful especially considering this change.

ETA: You don’t need to authenticate or share your login with this utility, basically zero setup.

▲

mtmail 5 days ago | parent | prev | next [-]

Package page (with screenshot) https://www.npmjs.com/package/ccusage

▲

bravura 5 days ago | parent | prev [-]

Does ccusage (or claude code with subscription) actually tell you what the limits are or how close you are too them?

	▲	steveklabnik 5 days ago \| parent [-]
		https://ccusage.com/guide/live-monitoring See that screenshot. It certainly shows you when your 5 hour session is set to refresh, in my understanding it also attempts to show you how you're doing with other limits via projection.

▲

flkiwi 5 days ago | parent | prev | next [-]

It's not exactly the same thing, but imagine my complete surprise when, in the middle of a discussion with Copilot and without warning, it announced that the conversation had reached its length limit and I had to start a new one with absolutely no context from the current one. Copilot has many, many usability quirks, but that was the first that actually made me mad.

	▲	jononor 5 days ago \| parent \| next [-]
		ChatGPT and Claude do the same. And I have noticed that model performance can often degrade a lot before such a hard limit. So even when not hitting the hard limit, splitting out to a new session can be useful. Context management is the new prompt engineering...
	▲	stronglikedan 5 days ago \| parent \| prev [-]
		The craziest thing to me is that it actually completely stopped you in your tracks instead of upselling you on the spot to continue.

▲

mvieira38 5 days ago | parent | prev [-]

You can't really predict usage of output tokens, too, so this is especially concerning

	▲	qiller 5 days ago \| parent [-]
		Like when Claude suddenly decides it's not happy with a tiny one-off script and generates 20 refined versions :D

▲ nine_k 4 days ago | parent | prev | next [-]

Resources that are "Unlimited" in marketing speak are rate-limited in practice. Your unlimited internet connection limits your daily transfer by bandwidth, both at your port and at the remote service ports. Your daily amount of SMS is limited by sending rate. Your all-you-can-eat restaurant order is limited by your belly.

No wonder that access to an expensive API which is an LLM is also rate-limited.

What does surprise me is that you can't buy an extra serving by paying more (twice the limit for 3x the cost, for instance). Either subscriptions don't make enough money, or their limits are at their datacenters and they have no spare capacity for premium plans.

	▲	FanaHOVA 4 days ago \| parent [-]
		You can pay more. It's unlimited (sorta) through API at API pricing.

▲ andix 5 days ago | parent | prev | next [-]

I guess you need to get used to it. LLM token usage directly translates to energy consumption. There are also no flat fee electricity plans, it doesn't make any sense.

▲

idunnoboutthat 5 days ago | parent [-]

that's true of everything on the internet.

▲

andix 5 days ago | parent | next [-]

Yes, but for most things it's not significant.

For example Stack Overflow used to handle all their traffic from 9 on-prem servers (not sure if this is still the case). Millions of daily users. Power consumption and hardware cost is completely insignificant in this case.

LLM inference pricing is mostly driven by power consumption and hardware cost (which also takes a lot of power/heat to manufacture).

▲

Twirrim 5 days ago | parent [-]

> For example Stack Overflow used to handle all their traffic from 9 on-prem servers (not sure if this is still the case). Millions of daily users. Power consumption and hardware cost is completely insignificant in this case.

They just finished their migration to the cloud, unracked their servers a few weeks ago https://stackoverflow.blog/2025/07/16/the-great-unracking-sa...

	▲	jononor 5 days ago \| parent [-]
		Would have loved to get some more insights. Cost estimates, before and after, for example. But also if any architectural changes where needed, or what kind of other challenges and learnings they got from the migration.

▲

tracker1 5 days ago | parent | prev [-]

An "AI" box with a few high end gpu/npu cards takes more energy in a 4u box than an entire rack of commodity hardware takes. It's not nearly comparible... meaning entirely new and expansive infrastructure costs to support the high energy. That also doesn't count the needs for really high bandwidth networking to these systems. Not to mention the insanely more expensive hardware costs.

The infrastructure and hardware costs are seriously more costly than typical internet apps and storage.

▲ jm4 5 days ago | parent | prev | next [-]

Blame the idiots who abused it. Like that guy who posted a video a couple weeks ago where he had like 6 instances going nonstop and he was controlling it with his voice. There was some other project posted recently that was queuing up requests so that you could hit the limits in every time block. I've seen reddit posts where people were looking for others to share team accounts. It's always the morons who ruin a good thing.

Unless/until I start having problems with limits, I'm willing to reserve judgment. On a max plan, I expect to be able to use it throughout my workday without hitting limits. Occasionally, I run a couple instances because I'm multitasking and those were the only times I would hit limits on the 5x plan. I can live with that. I don't hit limits on the 20x plan.

▲

jjani 4 days ago | parent [-]

Such abusers are very rarely a whole 5% of accounts, almost certainly <=2%.

	▲	npc_anon 4 days ago \| parent [-]
		In a recent tweet the company gave an example of a user running Claude 24/7 costing them "tens of thousands of dollars". For any such users it requires hundreds if not thousands of "normal" users just to break even.

▲ gedy 5 days ago | parent | prev | next [-]

I get it, and feel the same way but the current LLMs are very resource intensive. To the point that I'm reluctant to go all in on these tools if in future we get rug-pulled once companies admit "okay, this was not sustainable at that price.."

	▲	dust42 4 days ago \| parent \| next [-]
		I am really afraid of this as well. When using one of the plugins for vscode, I would easily use a few million tokens a day. Thus with Claude Code I assume it isn't much different under the hood. The prices on openrouter are roughly $5/M for the better models. Therefore paying $20/month can't be sustainable. Once enough developers are addicted to AI assisted coding the VCs will inevitably pull the rug. I wonder if Alibaba will put out a 100B A10B coder model which could probably run for $0.5/M while giving decent output. That would be easily affordable for most developers/companies.
	▲	andix 5 days ago \| parent \| prev [-]
		Some people claim we already reached peak-LLM. It's cheap and powerful right now, in the future it might just get more expensive, or worse quality for the same price.

▲ NicuCalcea 5 days ago | parent | prev | next [-]

> I really am not a fan of having to feel like I'm using a limited resource whenever I use an LLM

Well, it is a limited resource, I'm glad they're making that clear.

▲

hn_throwaway_99 5 days ago | parent [-]

Yeah, this sentence "I really am not a fan of having to feel like I'm using a limited resource whenever I use an LLM" felt like "I'm not a fan of reality" to me.

Lots of things still have usage-based pricing (last I checked no gas stations are offering "all you can fill up" specials), and those things work out fine.

	▲	NicuCalcea 5 days ago \| parent [-]
		These subscriptions only work because lighter users subsidise heavier users, but I guess the really heavy users are such big outliers that the maths isn't working out for Anthropic. I'm really not liking this neo-rentier capitalism we find ourselves in.

▲ xtracto 5 days ago | parent | prev | next [-]

This is actually so fascinating to me. I remember when we had metered & very expensive long distance calls, "metered" dial-up Internet (Todito Card!), then capped DSL internet, then metered Mobile calls, SMSs and then Metered Mobile internet (that last one we still do).

The stuff that we do now, my 13 year old self in 1994 would never dream of! When I dialed my 33.6kbps modem and left it going the whole night, to download an mp3.

It's exciting that nowadays we complain about Intelligent Agents bandwidth plans!! Can you imagine! I cannot imagine the stuff that will be built when this tech has the same availability as The Internet, or POTS!

▲ WD-42 5 days ago | parent | prev | next [-]

Try running one locally and observe as the temperature in your office rises a few degrees and the lights dim with every prompt. I didn’t really get the pricing myself until I got a desktop to do local inference. There’s a reason why these companies want to build nuclear plants next to their data centers.

▲ furyofantares 5 days ago | parent | prev | next [-]

The Sonnet usage does not really look limited at 240-480 hours per week (a week has 168 hours in it).

Opus at 24-40 looks pretty good too. A little hard to believe they aren't losing a bunch of money still if you're using those limits tbh.

▲

clharman 5 days ago | parent | next [-]

Pretty sure they are still losing money on it, which is great for us. And these limits wouldn't even be happening if there weren't people bragging about having their CC running constantly for 30 hours writing 2 million lines of (doubtless bad) code. And sharing accounts to try to get even MORE usage. It's all that swarm guy tbh and he's proud of it.

▲

j45 5 days ago | parent | prev | next [-]

The calculation of hours is a little tough to imagine sometimes, is it the inference time itself, or the period of time used? Is there an average token cost per hour of use (average or explicit?)

▲

Filligree 4 days ago | parent [-]

It had better be inference time; I regularly have Claude call out to tools that take hours to run.

	▲	j45 3 days ago \| parent [-]
		Yeah, I hope so too, but the silence might be a "it depends"

▲

wrs 5 days ago | parent | prev [-]

You can run multiple instances of Claude Code at the same time.

	▲	furyofantares 5 days ago \| parent [-]
		I know, I do it all the time! I wasn't calling out 240 hours as being impossible to hit in 168 hour weeks. I suppose "does not really look limited" could be read multiple ways - I did not mean it's literally unlimited, just that it doesn't look very limited. You can make your own comparison to however many hours you usually spend working in a week and how many sessions you have active on average during that time.

▲ beepbooptheory 5 days ago | parent | prev | next [-]

I'm sure people hate this mindset here, but any time I use an LLM I just picture the thousands of fans spinning, the heat of the datacenter... I treat each prompt like a painful interval where I am leaving my door open on a hot day.

I know nobody else really cares.. In some ways I wish I didn't think like this.. But its at this point not even an ethical thing, its just a weird fixation. Like I can't help but feel we are all using ovens when we would be fine with a toasters.

▲ sergiotapia 5 days ago | parent | prev | next [-]

Think of an insane number of requests. Now 20x it, that's what the top 1% of Claude users are at. Just fleecing the service dry. hard problem, what else could Claude do tbh

▲

handfuloflight 5 days ago | parent [-]

...maybe use their superintelligent AI to come up with a solution that specifically targets the abusers?

▲

SatvikBeri 5 days ago | parent | next [-]

...like adding limits that only affect a small fraction of users?

▲

Yossarrian22 5 days ago | parent [-]

Is 1/20 small?

	▲	dotancohen 5 days ago \| parent [-]
		I'm Jewish. We take that right off the top shortly after birth.

▲

dom96 5 days ago | parent | prev | next [-]

Easy, they just gotta hit up the AI on each request with a prompt like "You are an AI that detects abuse, if this request is abusive block it" /s

▲

cyanydeez 5 days ago | parent | prev | next [-]

[flagged]

▲

blitzar 5 days ago | parent | prev [-]

Claude says - The key is maintaining user agency—let them choose how to manage their usage rather than imposing arbitrary cutoffs.

It suggests:

Transparent queueing - Instead of blocking, queue requests with clear wait time estimates. Users can choose to wait or reschedule.

Usage smoothing - Soft caps with gradually increasing response times (e.g., 2s → 5s → 10s) rather than hard cutoffs.

Declared priority queues - Let users specify request urgency. Background tasks get lower priority but aren't blocked.

Time-based scheduling - Allow users to schedule non-urgent work during off-peak hours at standard rates.

Burst credits - Banking system where users accumulate credits during low usage periods for occasional heavy use.

▲ belter 5 days ago | parent | prev | next [-]

The real bottleneck isn’t Jevons paradox, it’s the Theory of Constraints. A human brain runs on under 20 W, yet every major LLM vendor is burning cash and running up against power supply limits.

If anything pops this bubble, it won’t be ethics panels or model tweaks but subscription prices finally reflecting those electricity bills.

At that point, companies might rediscover the ROI of good old meat based AI.

▲ alwillis 5 days ago | parent | next [-]

At that point, companies might rediscover the ROI of good old meat based AI.

That’s like saying when the price of gasoline gets too high, people will stop driving.

Once a lifestyle is based on driving (like commuting from the suburbs to a job in the city), it’s quite difficult and in some cases, impossible without disrupting everything else.

A gallon of gas is about 892% higher in 2025 than it was in 1970 (not adjusted for inflation) and yet most people in the US still drive.

The benefits of LLMs are too numerous to put that genie back in the bottle.

We’re at the original Mac (128K of RAM, 9-inch B&W screen, no hard drive) stage of LLMs as a mainstream product.

▲

belter 4 days ago | parent [-]

> when the price of gasoline gets too high

People get electric cars or public transport....

	▲	Nemo_bis 4 days ago \| parent [-]
		Indeed > Adjusting for long-term ridership trends on each system, seasonal effects, and inertia (the tendency for ridership totals to persist from one month to the next), CBO estimates that the same increase of 20 per- cent in gasoline prices that affects freeway traffic volume is associated with an increase of 1.9 percent in average system ridership. That result is moderately statistically significant: It can be asserted with 95 percent confidence that higher gasoline prices are associated with increased ridership. https://www.cbo.gov/sites/default/files/110th-congress-2007-...

▲ hkt 5 days ago | parent | prev | next [-]

I suspect for this reason we are going to see a lot of attempts at applied AI: I saw an article semi-recently about an AI weather forecasting model using considerably less power than it's algorithmic predecessor, for instance. The answer is, as ever, to climb the value chain and make every penny (and joule) count.

▲ TeMPOraL 5 days ago | parent | prev | next [-]

Where is this oft-repeated idea coming from? Inference isn't that expensive.

	▲	belter 5 days ago \| parent [-]
		My back of envelope estimate, is that even a partly restricted plan, would need to cost roughly $4,000–$4,500 per month just to break even.

▲ dotancohen 5 days ago | parent | prev | next [-]

  > good old meat based AI.

NI, or really just I.

Though some of us might fall into the NS category instead.

▲ margalabargala 5 days ago | parent | prev | next [-]

Meat has far higher input requirements for good performance above raw energy

▲

belter 4 days ago | parent [-]

Hire Vegan Developers... :-)

▲

ben_w 4 days ago | parent [-]

I'm not sure they meant that, but they might have.

An alternative reading is that we (i.e. "good old meat based AI") need more than just calories to make stuff.

	▲	margalabargala 4 days ago \| parent [-]
		> An alternative reading is that we (i.e. "good old meat based AI") need more than just calories to make stuff. That was what I meant; we require also things like shelter, emotional wellbeing, and more, to operate at top performance levels.

▲ ben_w 4 days ago | parent | prev [-]

> At that point, companies might rediscover the ROI of good old meat based AI.

I doubt this will look good for any party.

The global electricity supply is 375 W/capita, and there's a lot of direct evidence in the form of "building new power plants" that the companies are electricity-limited. I have long observed the trends of renewable energy, but even assuming their rapid exponential growth continues, they can only roughly double this by 2032.

If we just simplify the discussion about the quality of LLMs output as "about as good as a junior graduate", then the electricity bill can increase until the price curve of {the cost of supplying that inference} matches the price curve of {the cost of hiring a junior graduate}. If the electricity price is fixed, graduates can't earn enough to feed themselves. If the graduates earn the smallest possible amount of money needed to feed and house themselves in G7 nations, then normal people are priced out of using heating/AC, the street lights get turned off because municipalities won't be able to cover the bill. If the electricity for inference becomes as expensive as hiring Silicon Valley software engineering graduates, then normal people won't even be able to keep their phones charged.

That said:

> A human brain runs on under 20 W

Only if you ignore the body it's attached to, which we cannot currently live without. And we do also need a lot of time off, as we start working at 21 and stop at just under 70 (so 5/8ths of our lives), and the working week is 40 hours out of 168, and we need more time beyond that away from paid work for sickness and reproduction, and many of us also like holidays.

Between all the capacity factors, for every hour (@20W = 20 Wh) of the average American worker's brain being on a job, there's a corresponding average of about 1 kWh used by the bodies of various Americans.

▲ raincole 5 days ago | parent | prev | next [-]

Unlimited plans = the users who use it least subsidize the users who use it a lot.

I don't really know how it's sustainable for something like SOTA LLMs.

▲ smcl 4 days ago | parent | prev | next [-]

The problem is that this company is haemhorraging money and cannot possibly offer an unlimited plan.

▲

pluto_modadic 4 days ago | parent [-]

*cannot possibly subsidize an unlimited plan and must course correct on pricing to reflect cost-plus-pricing instead.

	▲	smcl 4 days ago \| parent [-]
		Last year they had $900 million in revenue and ended up losing $5.6 billion. I suspect cutting off a few whales isn't enough to reverse that and they're gonna need to "course correct" a bit further

▲ volleygman180 5 days ago | parent | prev | next [-]

Agreed. The internet would be livid if Apple Music or Hulu limited how many hours you were allowed to stream per week. Especially the users who pay for the top-tier packages that include 4K (or lossless for music), extra channel add-ons, etc.

▲ vlan0 5 days ago | parent | prev [-]

Do you wonder why that feeling arising inside of you?