> but the published data immediately seems to admit that this is a bad choice of unit because it costs a lot more to output a token than input one

And, that's silly, because API pricing is more expensive for output than input tokens, 5x so for Anthropic [1], and 6x so for OpenAI!

[1] https://platform.claude.com/docs/en/about-claude/pricing

[2] https://openai.com/api/pricing

▲

AlphaSite 2 hours ago | parent [-]

I think for the same model wall time is probably a more intuitive metric; at the end of the day what you’re doing is renting GPU time slices.

Large outputs dominate compute time so are more expensive.

IMO input and output token counts are actually still a bad metric since they linearise non linear cost increases and I suspect we’ll see another change in the future where they bucket by context length. XL output contexts may be 20x more expensive instead of 10x.

	▲	nsomaru an hour ago \| parent [-]
		They already bucket when context goes above 200k