| ▲ | Den_VR 5 days ago |
| So, bottom line, do you think it’s probable that either OpenAI or Anthropic are “losing money on inference?” |
|
| ▲ | chillee 5 days ago | parent | next [-] |
| No. In some sense, the article comes to the right conclusion haha. But it's probably >100x off on its central premise about output tokens costing more than input. |
| |
| ▲ | martinald 5 days ago | parent | next [-] | | Thanks for the correction (author here). I'll update the article - very fair point on compute on input tokens which I messed up. Tbh I'm pleased my napkin math was only 7x off the laws of physics :). Even rerunning the math on my use cases with way higher input token cost doesn't change much though. | | |
| ▲ | chillee 5 days ago | parent [-] | | The 32 parallel sequences is also arbitrary and significantly changes your conclusions. For example, if they run with 256 parallel sequences then that would result in a 8x cheaper factor in your calculations for both prefill and decode. The component about requiring long context lengths to be compute-bound for attention is also quite misleading. | | |
| |
| ▲ | doctorpangloss 5 days ago | parent | prev [-] | | I’m pretty sure input tokens are cheap because they want to ingest the data for training later no? They want huge contexts to slice up. | | |
| ▲ | awwaiid 4 days ago | parent [-] | | Afaik all the large providers flipped the default to contractually NOT train on your data. So no, training data context size is not a factor. |
|
|
|
| ▲ | diamond559 5 days ago | parent | prev [-] |
| Even if it is, ignoring the biggest costs going into the product and then claiming they are profitable would be actual fraud. |