| ▲ | h14h 4 hours ago | |
For some real data, Artificial Analysis reported that 4.6 (max) and 4.7 (max) used 160M tokens and 100M tokens to complete their benchmark suite, respectively: https://artificialanalysis.ai/?intelligence-efficiency=intel... Looking at their cost breakdown, while input cost rose by $800, output cost dropped by $1400. Granted whether output offsets input will be very use-case dependent, and I imagine the delta is a lot closer at lower effort levels. | ||
| ▲ | theptip 2 hours ago | parent [-] | |
This is the right way of thinking end-to-end. Tokenizer changes are one piece to understand for sure, but as you say, you need to evaluate $/task not $/token or #tokens/task alone. | ||