I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models. We've got near-frontier capabilities from open source models from China at pennies on the dollar compared to US big tech rollouts. OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

▲

andrewmutz 2 hours ago | parent | next [-]

Both can be true. They can be charging what the market will bear, and still be charging less than their costs of running it.

	▲	wyre an hour ago \| parent [-]
		[dead]

▲

InsideOutSanta an hour ago | parent | prev | next [-]

> I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models

We know roughly how much these companies spend and what their revenues are. Based on that, they'd have to more than double revenue (without spending more money) just to stay even, and that's not good enough given how deep in the hole they are.

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both are true. I mean, I'd be willing to spend a bit more than I do now, but not more than double, and neither are most companies. The company I work for is currently investigating how to reduce LLM spend, not looking to spend more.

▲

dontlikeyoueith an hour ago | parent | prev | next [-]

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both. They are charging the most they can get away with and that amount is still heavily subsidized by VC capital.

▲

pimeys an hour ago | parent | prev | next [-]

We pay by token at work. I just finished one session with Opus that was 4000 dollars. In about three days.

Now that 200USD subscription starts to feel cheap...

▲

zozbot234 40 minutes ago | parent | next [-]

That would be about ~300 tok/s over 72 hours at Claude Fable output token prices? I'm not sure that this passes a sanity test.

	▲	unholiness 17 minutes ago \| parent [-]
		Subagents are a helluva drug.

▲

rubyn00bie 33 minutes ago | parent | prev [-]

Just outta curiosity, as I’ve never gotten a spend anywhere near that, what variant were you using? Like max context window and fast mode? Or was it just chugging along non stop for three days?

	▲	pimeys 20 minutes ago \| parent [-]
		Fast mode max content window. The task was: replace all 1600+ queries from one database to another and make the whole integration test pass. We did multiple passes, with different concerns when changing from database to another. My OpenCode session right now says $4,365.02. I haven't gotten close to this either before, but now we wanted to move fast because this branch gets conflicts all the time and we want to get over with the migration asap.

▲

schaefer an hour ago | parent | prev | next [-]

> I don't think anyone has a firm grasp on actual inference costs.

There are huge numbers of users (myself included) that do have an exact idea of what inference costs on open models. Because we can buy tokens from 3rd parties that have no motivation to subsidize our use. That's to say, there's a fair marketplace[1] and we're hanging out there.

If you want to say "I don't think anyone has a firm grasp on actual inference costs on these proprietary/closed models", then I could agree with that.

[1]: https://openrouter.ai/rankings#leaderboard

▲

logicchains an hour ago | parent | prev | next [-]

We have a firm grasp on actual inference costs from the various open weights model providers on OpenRouter. They don't have the money to subsidize inference and it's quite a competitive market, so the prices are representative of the costs.

▲

MichaelMedbed 2 hours ago | parent | prev [-]

[flagged]

	▲	kllrnohj 2 hours ago \| parent \| next [-]
		regardless of whether that's true or not, US companies doing hosted inference of the models coming out of China are also significantly cheaper than those from OpenAI or Anthropic
	▲	polski-g 2 hours ago \| parent \| prev [-]
		Not relevant to the post.