I think the biggest problem is not necessarily the cost to develop & serve the models, but how quickly user behavior changed with token based pricing.

I know a lot of people at companies where the marching orders changed on a dime end of Q1/start of Q2. These are shops that were fully on the "use AI or die (because we will fire you)" train.

Now there's monitoring, reporting, alerting not just on overall cost but on "over-use" of best/priciest models based on total-or-percent tokens/dollars, etc. All of this comes with direct developer engagement & standardized management escalation for holding it wrong.

To me this customer behavior does not smell like a product you can 10x the pricing on to get profitable. We have exited the exploration phase and now ROI matters.

▲

burningChrome 4 hours ago | parent | next [-]

I can give you some additional anecdotal evidence to support your comment.

I work at a Fortune 200 company. At first, it was the Wild West. Need an LLM? You got it. Need to or want to build an army of agents? Done and done. We literally had everything at the tips of fingers for about 3 months. Teams were building their own internal tools, the team I work on canceled contracts with several software vendors because teams were building the same tools for what they thought was nothing.

Then they signed contracts with Anthropic and Google because I would assume they saw the token usage was through the roof. One month later? They completely cut off access to everybody for both Claude and Gemini. If you wanted access? Suddenly it was several forms, along with several approvals and a rock solid business case why you needed it. And before you got to the forms? You were added to a waiting list that was thousands of people long.

The entire company is now in damage control after trying to get the genie back in the bottle. I'm guessing someone saw how much we would be paying for the tokens we'd been using and decided to shut the party down so to speak.

▲

sdesol 4 hours ago | parent [-]

Was there at least performance gains to be measured?

	▲	burningChrome 4 hours ago \| parent [-]
		AFAIK nobody was collecting analytics. The one team I was working on had put out a goal of "30% more efficient" using AI tools. Its about as subject as you can get. We never got around to what exactly that meant before everything got shut down. Myself and several other devs were laughing about the whole thing. The company was so amped about what AI could do they never even bothered collecting any analytics that would affirm or deny any of this had a positive impact. Even some of my team members were talking about the placebo effect AI has had on a lot of C-Suite folks.

▲

dranudin 5 hours ago | parent | prev | next [-]

I can second this. Our company and department was all-in on AI. And since the token-based pricing came in, we got an email from IT that tried to explain that most developers don't know how to choose models and that the cheap models should be good enough for most of our work ..

▲

verdverm 4 hours ago | parent [-]

Have they built an internal ai enablement team?

	▲	dranudin 4 hours ago \| parent [-]
		Yes :D

▲

piker 5 hours ago | parent | prev | next [-]

I.e., the demand for programming tokens turns out to be quite elastic.

▲

steveBK123 5 hours ago | parent | next [-]

I would imagine it only gets worse in the face of good-enough open/chinese/local models too right?

Microsoft adding Deepseek support already as I recall?

That is - for any definition of "they are behind X months" then eventually they get to the point Claude was in January when the world freaked out, but at 1/10th the cost. A lot of firms are going to mandate that is good enough for their developers.

▲

sdesol 5 hours ago | parent | next [-]

> Microsoft adding Deepseek support

I believe this hasn't been confirmed yet but I think it speaks to a bigger problem for the AI companies which is, if you give capable developers a good reasoning LLM, they can make it work like it was a really expensive model.

I believe we are 100% at the stage of good enough for the vast majority of tech companines. Fable and others will be more valuable for non-traditional tech companies.

I read somewhere that the chinese AI companies are sharing knowledge and it would not surprise me if the government is applying pressure by saying work together or else. If they work together, they can truly commoditize LLMs and with China ramping up hardware support for AI, I see the future being inference speed and hardware being the moat.

▲

thewebguyd 5 hours ago | parent [-]

If hardware becomes the moat, the US frontier labs are screwed. We have AWS, Azure, GCP. All three have or are making inference silicon. LLMs become just another service in the public cloud's large service catalog, and open weight wins.

Which makes sense to me. Selling a chatbot interface/model access to the general public was never going to be a viable long term play. You still need developers to wrap the models into specialized tools. Queue the Jobs quote "It's a feature, not a product."

▲

KolibriFly 2 hours ago | parent | next [-]

The funniest thing would be if in a couple years LLMs just end up being another checkbox next to PostgreSQL and Kubernetes

	▲	thewebguyd 39 minutes ago \| parent [-]
		I don't think that's far fetched at all either and is probably the end game ultimately. No one wants to buy a chatbot, they want to automate something with it. Intelligence is just another PaaS offering right next to storage, compute. The only hiccup in that happening is will the US Gov let Anthropic and/or OpenAI fail when that time comes.

▲

sdesol 4 hours ago | parent | prev [-]

The big thing is, the western world has moved so much of the manufacturing to China and think a lot of people will not forgive Samsung and others, so I can see China owning a good portion of the supply chain.

▲

johnvanommen 4 hours ago | parent [-]

> The big thing is, the western world has moved so much of the manufacturing to China

I built my career on Solaris and it got rugpulled by Linux.

That wasn’t because of software, it was because of hardware. Linux’s cost advantage existed because Sun hardware had huge margins, because their software was basically free.

AI will probably be a repeat of this. Whoever can come up with the hardware solution that minimizes the cost per token will win.

I believe the 5090 still holds this crown, but someone certainly knows better than I do.

	▲	rescbr 3 hours ago \| parent \| next [-]
		While people fly to the US to buy Macs at a lower price and bring them back in their backpacks, I guess I'll be flying to HK to buy a Chinese GPU rather sooner than later...
	▲	trollbridge 3 hours ago \| parent \| prev \| next [-]
		Fortunately, Solaris skills map to Linux pretty cleanly.
	▲	fragmede 3 hours ago \| parent \| prev [-]
		but not all tokens are equal and vertical integration is the name of the game. Solaris did not lose to Linux, it lost to the LAMP stack on commodity x86 hardware. without the "AMP" part, Linux would've been dead in the water.

▲

CuriouslyC 5 hours ago | parent | prev | next [-]

100%. There will be strict quotas on the expensive models and day to day work will be done on the cheap models that are "good enough" with escalation to the metered models when the cheaper options are spinning their wheels. Eventually the US frontier lab APIs will only get the most heavily triaged work that multiple tiers of cheaper Chinese open weight models have failed on.

And of course the C-suite will have unlimited access to Mythos tier models, which they'll use to summarize reports, while passing down mandates to rank and file to increase usage of less expensive models.

▲

verdverm 4 hours ago | parent | prev [-]

Yup, we are in the process of getting access to US hosted Chinese models. I've been petitioning Google and our rep, we will see but I suspect they will cave eventually. Gemini sucks and if they don't sell what their customers want, we go shopping around.

▲

jayd16 5 hours ago | parent | prev [-]

If folks won't pay a higher price, doesn't that mean it's inelastic?

▲

unholiness 5 hours ago | parent | next [-]

"Elastic" in economics happens to refers to how elastic the supply/demand is when the price changes (not vice versa, as you're describing). So e.g. an inelastic demand means the quantity demanded changes very little when the price doubles.

▲

steveBK123 5 hours ago | parent | prev [-]

Elastic demand means buyers are highly sensitive; a price hike causes a massive drop in purchases. Inelastic demand means buyers aren’t very sensitive; they keep buying regardless of price

	▲	jayd16 5 hours ago \| parent [-]
		Ah alright I have it backwards then.

▲

ofjcihen 4 hours ago | parent | prev [-]

I do a lot of client work for fortune 100’s.

Over the last month I have seen companies scrambling to measure deliverables against cost. Most of the back room talk is to the affect of giving devs a small allowance ($500 a month) and then making them prove their own productivity increases (again, based on deliverables, not LoC) before they either take it away or give them more.

Obviously this won’t be on an individual basis but some kind of unit.

Either way, with how much I see these companies cutting back I have no idea how the big AI companies are going to be profitable.