Currently, selling LLM inference is a red queen race: the moment you release a model, others begin distilling and attempting to sell your model cheaper, avoiding the expensive capitalized costs associated with R&D. This can occur because the LLM market is fundamentally -- at best -- minimally differentiated; consumers are willing to switch between vendors ("big labs", as you call them, but they aren't really research labs) to whomever offers the best model at the lowest price. This is emphasized by the distributors of many LLMs, developer tools, offering ways to switch the LLM at runtime (see https://www.jetbrains.com/help/ai-assistant/use-custom-model... or https://code.visualstudio.com/docs/copilot/customization/lan... for an example of this). The distributors of LLMs actively working against LLM providers margin provides an exceptionally strong headwind.

This market dynamic begets a low margin race to the bottom, where no party appears able to secure the highly attractive (think the >70% service margin we see in typical tech) unit economics typical of tech.

Inference is a very tough business. It is my opinion (and likely the opinion of many others) that the margins will not sustain a typical "tech" business without continual investment to attempt to develop increasingly complex and expensive models, which itself is unprofitable.

▲

humanizersequel 4 days ago | parent [-]

I don't disagree but you're moving the goalposts. I never said that they could achieve the profits of a typical tech business, just that they could be profitable. Also, the whole distilling problem doesn't happen if the model is proprietary.

	▲	mgh95 4 days ago \| parent [-]
		> I don't disagree but you're moving the goalposts. I never said that they could achieve the profits of a typical tech business, just that they could be profitable. Also, the whole distilling problem doesn't happen if the model is proprietary. In the absence of typical software margins, they will be eroded by providers of "good enough" margins (AWS, Azure, GCP, etc.) who gain more profit from the bundled services than OpenAI does from the primary services. This has happened multiple times in history, either resulting in smaller businesses below IPO price (such as Elastic, Hashicorp, etc.) or outright bankruptcy. Second, the distilling happens on the outputs of the model. Model distillation refers to the usage of a models outputs to train a secondary smaller model. Do not mistake distillation for training (or retraining) to sparse models. You can absolutely distill proprietary models. In fact, that is how DeekSeek-R1-Distill-Qwen and the DeepSeek-R1-Distill-Llama are trained. This also happens with Chinese startups distilling OpenAI models to resell [2]. The worst part is OpenAI is already having to provide APIs to do this [1]. This is not ideal, as OpenAI wants to lock people into (as much as possible) a single platform. I really don't like OpenAIs market position here. I don't think it's long term profitable. [1] https://openai.com/index/api-model-distillation/ [2] https://www.theguardian.com/technology/2025/jan/29/openai-ch...