So, bottom line, do you think it’s probable that either OpenAI or Anthropic are “losing money on inference?”

No. In some sense, the article comes to the right conclusion haha. But it's probably >100x off on its central premise about output tokens costing more than input.

▲

martinald 5 days ago | parent | next [-]

Thanks for the correction (author here). I'll update the article - very fair point on compute on input tokens which I messed up. Tbh I'm pleased my napkin math was only 7x off the laws of physics :).

Even rerunning the math on my use cases with way higher input token cost doesn't change much though.

▲

chillee 5 days ago | parent [-]

The 32 parallel sequences is also arbitrary and significantly changes your conclusions. For example, if they run with 256 parallel sequences then that would result in a 8x cheaper factor in your calculations for both prefill and decode.

The component about requiring long context lengths to be compute-bound for attention is also quite misleading.

	▲	Barbing 5 days ago \| parent [-]
		Anyone up to publishing their own guess range?

▲

doctorpangloss 5 days ago | parent | prev [-]

I’m pretty sure input tokens are cheap because they want to ingest the data for training later no? They want huge contexts to slice up.

	▲	awwaiid 4 days ago \| parent [-]
		Afaik all the large providers flipped the default to contractually NOT train on your data. So no, training data context size is not a factor.

▲

diamond559 5 days ago | parent | prev [-]

Even if it is, ignoring the biggest costs going into the product and then claiming they are profitable would be actual fraud.