If you want LLMs to continue to be offered we have to get to a point where the providers are taking in more money than they are spending hosting them. And we still aren't there (or even close).

▲

hobom 2 days ago | parent | next [-]

They are taking in more than they are spending hosting them. However, the cost for training the next generation of models is not covered.

▲

bandrami 2 days ago | parent [-]

Nope. They're losing money on straight inference (you may be thinking of the interview where Dario described a hypothetical company that was positive margin). The only way they can make it look like they're making money on inference is by calling the ongoing reinforcement training of the currently-served model a capital rather than operational expense, which is both absurd and will absolutely not work for an IPO.

▲

wild_egg 2 days ago | parent | next [-]

Inference, in and of itself, can't be completely unprofitable. Unless you're purely talking about Anthropic?

But

> If you want LLMs to continue to be offered we have to get to a point where the providers are taking in more money than they are spending hosting them

Suggests you just mean in general, as a category, every provider is taking a loss. That seems implausible. Every provider on OpenRouter is giving away inference at a loss? For what purpose?

	▲	bandrami a day ago \| parent [-]
		For the same reason that Amazon operated at a loss for two decades and Uber operated at a loss for a decade and a half. The problem is the free money hose isn't running anymore.

▲

victorbjorklund 15 hours ago | parent | prev | next [-]

I really doubt that since prices are even higher than no-name hosts on open router etc charges.

▲

dgellow 2 days ago | parent | prev [-]

Do you have sources? I would be interested to read them

	▲	bandrami 2 days ago \| parent [-]
		Probably the best roundup is Ed Zitron at https://wheresyoured.at Half the articles are paywalled but the free ones outline the financial situation of the SOTA providers and he has receipts

▲

quikoa 2 days ago | parent | prev | next [-]

The open models may not be as great but maybe these are good enough. AI users can switch when the prices rise before it becomes sustainable for (some) of the large LLM providers.

▲

Gigachad 2 days ago | parent [-]

Currently it costs so much more to host an open model than it costs to subscribe to a much better hosted model. Which suggests it’s being massively subsidised still.

▲

finaard 2 days ago | parent | next [-]

For a lot of tasks smaller models work fine, though. Nowadays the problem is less model quality/speed, but more that it's a bit annoying to mix it in one workflow, with easy switching.

I'm currently making an effort to switch to local for stuff that can be local - initially stand alone tasks, longer term a nice harness for mixing. One example would be OCR/image description - I have hooks from dired to throw an image to local translategemma 27b which extracts the text, translates it to english, as necessary, adds a picture description, and - if it feels like - extra context. Works perfectly fine on my macbook.

Another example would be generating documentation - local qwen3 coder with a 256k context window does a great job at going through a codebase to check what is and isn't documented, and prepare a draft. I still replace pretty much all of the text - but it's good at collecting the technical details.

	▲	pbronez 2 days ago \| parent [-]
		I haven’t tried it yet, but Rapid MLX has a neat feature for automatic model switching. It runs a local model using Apple’s MLX framework, then “falls forward” to the cloud dynamically based on usage patterns: > Smart Cloud Routing > > Large-context requests auto-route to a cloud LLM (GPT-5, Claude, etc.) when local prefill would be slow. Routing based on new tokens after cache hit. --cloud-model openai/gpt-5 --cloud-threshold 20000 https://github.com/raullenchai/Rapid-MLX

▲

stingraycharles 2 days ago | parent | prev | next [-]

You can use open models through OpenRouter, but if you want good open models they’re actually pretty expensive fairly quickly as well.

▲

layoric 2 days ago | parent [-]

I've found MiniMax 2.7 pretty decent and even pay-as-you-go on OpenRouter, it's $0.30/mt in, and $1.20/mt out you can get some pretty heavy usage for between $5-$10. Their token subscription is heavily subsidized, but even if it goes up or away, its pretty decent. I'm pretty hopeful for these openweight models to become affordable at good enough performance.

	▲	stingraycharles 2 days ago \| parent [-]
		It’s okay, but if you compare it to eg Sonnet it’s just way too far off the mark all the time that I cannot use it.

▲

ericd 2 days ago | parent | prev | next [-]

Efficiency goes way up with concurrent requests, so not necessarily subsidy, could just be economy of scale.

▲

JumpCrisscross 2 days ago | parent | prev [-]

If I drop $10k on a souped-up Mac Studio, can that run a competent open-source model for OpenClaw?

▲

pbronez 2 days ago | parent | next [-]

Rapid MLX team has done some interesting benchmarking that suggests Qwopus 27B is pretty solid. Their tool includes benchmarking features so you can evaluate your own setup.

They have a metric called Model-Harness Index:

MHI = 0.50 × ToolCalling + 0.30 × HumanEval + 0.20 × MMLU (scale 0-100)

https://github.com/raullenchai/Rapid-MLX

	▲	JumpCrisscross 2 days ago \| parent [-]
		Pardon the silly question, but why do I need this tool versus running the model directly (and SSH’ing in when I’m away from home)?

▲

Atotalnoob 2 days ago | parent | prev [-]

Qwen is probably your best bet…

Edit: I’d also consider waiting for WWDC, they are supposed to be launching the new Mac Studio, an even if you don’t get it, you might be able to snag older models for cheaper

	▲	JumpCrisscross 2 days ago \| parent \| next [-]
		> consider waiting for WWDC 100% agree. I’m just looking forward to setting something up in my electronic closet that I can remote to instead of having everything tracked.
	▲	storus 2 days ago \| parent \| prev [-]
		Latest rumors are no Mac Studio until at least October.

▲

Larrikin 2 days ago | parent | prev | next [-]

It is nobody's responsibility to ensure billion dollar companies are profitable. Use them until local models are good enough

▲

lynx97 2 days ago | parent | prev | next [-]

I see the current situation as a plus. I get SOTA models for dumping prices. And once the public providers go up with their pricing, I will be able to switch to local AI because open models have improved so much.

▲

baruch 2 days ago | parent | prev | next [-]

If they started doing caching properly and using proper sunrooms for that they'd have a better chance with that

	▲	bandrami 2 days ago \| parent [-]
		If my empty plate had a pizza on it it would be a good lunch

▲

carefree-bob 2 days ago | parent | prev | next [-]

I think this has to be done with technological advances that makes things cheaper, not charging more.

I understand why they have to charge more, but not many are gonna be able to afford even $100 a month, and that doesn't seem to be sufficient.

It has to come with some combination of better algorithms or better hardware.

▲

bandrami 2 days ago | parent | next [-]

Making it more affordable would be very bad news for Amazon, who are now counting on $100B in new spending from OpenAI over the next 10 years.

▲

philipwhiuk 2 days ago | parent | next [-]

Someone's going to get burned here that's for sure. This isn't going to end with every person on the planet paying $100 a month for an LLM.

	▲	LtWorf 2 days ago \| parent [-]
		A guy from Meta interviewing at BBC a few years ago claimed that every school child in India was going to have the metaverse VR or they'd be left behind in their education, so every family was certainly going to pony up the money.

▲

throwthrowuknow 2 days ago | parent | prev [-]

Somethings not adding up. Why is Amazon making financial plans for the next decade based on continued OpenAI spending but you’re saying AI providers like OpenAI and Anthropic aren’t even close to being profitable, so how can they last a decade or more?

Who’s wrong?

▲

bandrami 2 days ago | parent | next [-]

I take it you don't remember 2008

▲

arcanemachiner 2 days ago | parent [-]

Are we before or after the part where they start throwing money out of helicopters?

	▲	bandrami 2 days ago \| parent [-]
		That's the interesting question, right? Because if this unwinds during a period of external inflation (say, because of a big war and energy shortage) then even the Bernanke would say helicopter money won't work

▲

nimchimpsky 2 days ago | parent | prev [-]

[dead]

▲

Gigachad 2 days ago | parent | prev [-]

They probably aren’t planning on making the money on consumer subscriptions. Any price is viable as long as the user can get more value out of it than they spend.

	▲	bandrami 2 days ago \| parent [-]
		"Sell this for less than it cost us" was a viable business plan during the ZIRP era but is not now

▲

vegnus 2 days ago | parent | prev | next [-]

I'll take local models over these corporate ones any day of the week. Hopefully it's only a matter of time

▲

holoduke 2 days ago | parent | prev | next [-]

Like with all new products. It takes time to let the market do its work. See if from a positive side. The demand for more and faster and bigger hardware is finally back after 15 years of dormancy. Finally we can see 128gb default memory or 64gb videocards in 2 years from now.

▲

nimchimpsky 2 days ago | parent | prev [-]

[dead]