I think this comes from the idea that serving these tokens without paying for training is already expensive, e.g. https://news.ycombinator.com/item?id=46613887 self-hosted solution might give you only 10-100x more affordable solution at cost.

So, given the SOTA providers with even larger models also need to continously be using considerable resources for training their next models, to fund future data centers, and make profit, the token costs are more likely reflecting the real costs, rather than the subscription costs.

▲ LUmBULtERA 10 hours ago | parent [-]

Except there are plenty of inference providers worldwide (including the US) that serve open-weight models that are not subsidized, and are reasonable in cost. Or is your claim that those are all running at a loss?

▲ _flux 10 hours ago | parent [-]

So they do not train models, and in addition their models are expected to be smaller than SOTA models, although we cannot know for sure by how much.

So what's the price difference, 3000x?

▲ LUmBULtERA 10 hours ago | parent [-]

My comment is about your statement "serving these tokens without paying for training is already expensive"...

One thing we do know from OpenAI's leaked financial document is that they are already profitable on inference, though that data is not broken down by cost and revenue of API vs. subscription. One important factor is that subscription inference can be optimized in ways to reduce cost (e.g., usage limits, batch optimization around API-prioritized inference, etc...). I think simply we do not know the actual cost of subscription interference for SOTA models.

	▲	_flux 7 hours ago \| parent [-]
		So I let ChatGPT do the legwork for me, but it does seem the price difference between inference for GPT-5.5 and open-weight frontier models like DeepSeek V4 Pro and Kimi K2.6, which both are smaller models and thus cheaper to run inference on, is only 8x or so. Sources https://openai.com/business/pricing/#api says for GPT-5.5: `Input:$5.00 / 1M tokens Cached input:$0.50 / 1M tokens Output:$30.00 / 1M tokens` and for https://docs.fireworks.ai/serverless/pricing DeepSeek V4 Pro: `Input: $1.74 / 1M tokens Cached input: $0.145 / 1M tokens Output: $3.48 /` Ratios are: 2.8, 3.4, 8.6 So as these numbers seem reasonably comparable to SOTA, and the SOTA vendors have additional overhead, then I think it is fair to deem that the alternative explanation offered here is not the explanation: > Why do you think that subscriptions are subsidized and not that enterprise tokens are sold at 3000% margin? As it does seem like the GPT-5.5 API tokens do not have significant margin based on the overhead-free companies selling inference for smaller models at prices of the same scale, I think we can believe that the subscriptions must be heavily subsidized. It should be noted though that DeepSeek itself sells this even cheaper, but they may also be in it for the getting market share.