| ▲ | monooso 12 hours ago | |||||||
Paul Kinlan published a blog post a couple of days ago [1] with some interesting data, that show output tokens only account for 4% of token usage. It's a pretty wide-reaching article, so here's the relevant quote (emphasis mine): > Real-world data from OpenRouter’s programming category shows 93.4% input tokens, 2.5% reasoning tokens, and just 4.0% output tokens. It’s almost entirely input. | ||||||||
| ▲ | colwont 5 hours ago | parent | next [-] | |||||||
This reduces token usage because it asks the model to think in AXON https://colwill.github.io/axon | ||||||||
| ▲ | weird-eye-issue 12 hours ago | parent | prev | next [-] | |||||||
Yes but with prompt caching decreasing the cost of the input by 90% and with output tokens not being cached and costing more than what do you think that results in? | ||||||||
| ▲ | wongarsu 12 hours ago | parent | prev | next [-] | |||||||
However output tokens are 5-10 times more expensive. So it ends up a lot more even on price | ||||||||
| ||||||||
| ▲ | verdverm 8 hours ago | parent | prev [-] | |||||||
My own output token ratio is 2% (50% savings on the expensive tokens, I include thinking in this, which is often more). I have similar tone and output formatting system prompt content. | ||||||||