Token based pricing generally makes a lot of sense for companies like Zed, but it sure does suck for forecasting spend.

Usage pricing on something like aws is pretty easy to figure out. You know what you're going to use, so you just do some simple arithmetic and you've got a pretty accurate idea. Even with serverless it's pretty easy. Tokens are so much harder, especially when using it in a development setting. It's so hard to have any reasonable forecast about how a team will use it, and how many tokens will be consumed.

I'm starting to track my usage with a bit of a breakdown in the hope that I'll find a somewhat reliable trend.

I suspect this is going to be one of the next big areas in cloud FinOps.

▲

garrickvanburen 2 days ago | parent | next [-]

My rant on token-based pricing is primarily based on the difficulty in consistently forecasting spend.....and also that the ongoing value of a token is controlled by the vendor...."the house always wins"

https://forstarters.substack.com/p/for-starters-59-on-credit...

	▲	coder543 2 days ago \| parent [-]
		There are enough vendors that it's difficult for any one vendor to charge too much per token. There are also a lot of really good open-weight models that your business could self-host if the hosted vendors all conspire to charge too much per token. (I believe it's only economical to self-host big models if you're using a lot of tokens, so there is a breakeven point.)

▲

prasoon2211 2 days ago | parent | prev | next [-]

This is partially why, at least for LLM-assisted coding workloads, orgs are going with the $200 / mo Claude Code plans and similar.

▲

jsheard 2 days ago | parent [-]

Until the rug inevitably gets pulled on those as well. It's not in your interest buy a $200/mo subscription unless you use >$200 of tokens per month, and long term it's not in their interest to sell you >$200 of tokens for a flat $200.

▲

sellyme 2 days ago | parent | next [-]

> It's not in your interest buy a $200/mo subscription unless you use >$200 of tokens per month

This is only true if you can find someone else selling them at cost.

If a company has a product that cost them $150, but they would ordinarily sell piecemeal for a total of $250, getting a stable recurring purchase at $200 might be worthwhile to them while still being a good deal for the customer.

▲

Hamuko 2 days ago | parent | prev | next [-]

The pricing model works as long as people (on average) think they need >$200 worth of tokens per month but actually do something less, like $170/month. Is that happening? No idea.

	▲	jsheard 2 days ago \| parent \| next [-]
		Maybe that is what Anthropic is banking on, from what I gather they obscure Max accounts actual token spend so it's hard for subscribers to tell if they're getting their moneys worth. https://github.com/anthropics/claude-code/issues/1109
	▲	hombre_fatal 2 days ago \| parent \| prev \| next [-]
		Well, the $200/mo plan model works as long as people on the $100/mo plan is insufficient for some people which works as long as the $17/mo plan is insufficient for some people. I don't see how it matters to you that you aren't saturating your $200 plan. You have it because you hit the limits of the $100/mo plan.
	▲	KallDrexx 2 days ago \| parent \| prev \| next [-]
		I don't know about for people using CC on a regular basis, but according to `ccusage`, I can trivially go over $20 of API credits in a few days of hobby use. I'd presume if you are paying for a $200 plan then you know you have heavy usage and can easily exceed that.
	▲	jopsen 2 days ago \| parent \| prev [-]
		It's probably easier (and hence, cheaper) to finance the AI infrastructure investments if you have a lot of recurring subscriptions. There is probably a lot of value in predictability. Meaning it might be visible for a $200, to offer more tokens than $200.

▲

baq 2 days ago | parent | prev [-]

meanwhile me hiding from accounting for spending $500 on cursor max mode in a day

▲

typpilol 2 days ago | parent [-]

Did you actually get 500 bucks worth of work out of it?

	▲	GoatInGrey 2 days ago \| parent \| next [-]
		How should they know? It's not like they're checking what it does.
	▲	baq 2 days ago \| parent \| prev \| next [-]
		No way to measure it directly, but it did write 4kLOC of mostly working angular... whether non-max would manage the same feat in the same time is an open question.
	▲	joegibbs 2 days ago \| parent \| prev [-]
		It depends on the salary, right? If you're in Silicon Valley paying 500k TC it probably makes sense to let your employees go wild and use as much token spend as they like.

▲

Spartan-S63 2 days ago | parent | prev | next [-]

> I suspect this is going to be one of the next big areas in cloud FinOps.

It already is. There’s been a lot of talk and development around FinOps for AI and the challenges that come with that. For companies, forecasting token usage and AI costs is non-trivial for internal purposes. For external products, what’s the right unit economic? $/token, $/agentic execution, etc? The former is detached from customer value, the latter is hard to track and will have lots of variance.

With how variable output size can be (and input), it’s a tricky space to really get a grasp on at this point in time. It’ll become a solved problem, but right now, it’s the Wild West.

▲

scuff3d 2 days ago | parent | prev | next [-]

Also seems like a great idea to create a business models where the companies aren't incentivised to provide the best product possible. Instead they'll want to create a product just useful enough to not drive away users, but just useless enough to temp people to go up a tier, "I'm so close, just one more prompt and it will be right this time!"

Edit: To be clear, I'm not talking about Zed. I'm talking about the companies make the models.

▲

GoatInGrey 2 days ago | parent | next [-]

As well as gatekeep functionality behind the prompt box. Want to find and replace? Regex? Insert a new column? Add a line break? Have the AI do it and pay us for those tokens whether it works the first time or not!

I unfortunately have seen many AI-based tools being demoed with this approach. The goal is clearly to monetize every user action while piggybacking off of models provided by a third-party. The gross thing is that leadership from the director level up LOVES these demos, even when the models very clearly fuck up in the demo.

AI: "I have cleaned the formatting for all 4,650 records in your sample XML files. Let me know if there's anything else I can do to help!"

Me: "There are over 25,000 records in that data..."

AI: "You're absolutely right!"

▲

potlee 2 days ago | parent | prev [-]

While Apple is incentivized to ship a smaller battery to cut costs, it is also incentivized to make their software efficient as possible to make the best use of the battery they do ship

	▲	scuff3d 2 days ago \| parent [-]
		That's not the same thing at all.

▲

mdasen 2 days ago | parent | prev | next [-]

I agree that tokens are a really hard metric for people. I think most people are used to getting something with a certain amount of capacity per time and dealing with that. If you get a server from AWS, you're getting a certain amount of capacity per time. You still might not know what it's going to cost you to do what you want - you might need more capacity to run your website than you think. But you understand the units that are being billed to you and it can't spiral out of control (assuming you aren't using autoscaling or something).

When you get Claude Code's $20 plan, you get "around 45 messages every 5 hours". I don't really know what that means. Does that mean I get 45 total conversations? Do minor followups count against a message just as much as a long initial prompt? Likewise, I don't know how many messages I'll use in a 5 hour period. However, I do understand when I start bumping up against limits. If I'm using it and start getting limited, I understand that pretty quickly - in the same way that I might understand a processor being slower and having to wait for things.

With tokens, I might blow through a month's worth of tokens in an afternoon. On one hand, it makes more sense to be flexible for users. If I don't use tokens for the first 10 days, they aren't lost. If I don't use Claude for the first 10 days, I don't get 2,160 message credits banked up. Likewise, if I know I'm going on vacation later, I can't use my Claude messages in advance. But it's just a lot easier for humans to understand bumping up against rate limits over a more finite period of time and get an intuition for what they need to budget for.

	▲	Filligree 2 days ago \| parent [-]
		Both prefill and decode count against Claude’s subscriptions; your conversations are N^2 in conversation length. My mental model is they’re assigning some amount of API credits to the account and billing the same way as if you were using tokens, shutting off at an arbitrary point. The point also appears to change based on load / time of day.

▲

jklinger410 2 days ago | parent | prev [-]

Token based pricing works for the company, but not for the user.