Remix.run Logo
h14h 4 hours ago

For some real data, Artificial Analysis reported that 4.6 (max) and 4.7 (max) used 160M tokens and 100M tokens to complete their benchmark suite, respectively:

https://artificialanalysis.ai/?intelligence-efficiency=intel...

Looking at their cost breakdown, while input cost rose by $800, output cost dropped by $1400. Granted whether output offsets input will be very use-case dependent, and I imagine the delta is a lot closer at lower effort levels.

theptip 2 hours ago | parent [-]

This is the right way of thinking end-to-end.

Tokenizer changes are one piece to understand for sure, but as you say, you need to evaluate $/task not $/token or #tokens/task alone.