Remix.run Logo
tedsanders 6 hours ago

Yep, it's more expensive per token.

However, I do want to emphasize that this is per token, not per task.

If we look at Opus 4.7, it uses smaller tokens (1-1.35x more than Opus 4.6) and it was also trained to think longer. https://www.anthropic.com/news/claude-opus-4-7

On the Artificial Analysis Intelligence Index eval for example, in order to hit a score of 57%, Opus 4.7 takes ~5x as many output tokens as GPT-5.5, which dwarfs the difference in per-token pricing.

The token differential varies a lot by task, so it's hard to give a reliable rule of thumb (I'm guessing it's usually going to be well below ~5x), but hope this shows that price per task is not a linear function of price per token, as different models use different token vocabularies and different amounts of tokens.

We have raised per-token prices for our last couple models, but we've also made them a lot more efficient for the same capability level.

(I work at OpenAI.)

2001zhaozhao 5 hours ago | parent | next [-]

I don't have anything to add, but I like how you guys are actually sending people to communicate in Hacker News. Brilliant.

simianwords 6 hours ago | parent | prev [-]

Maybe a good idea to be more explicit about this -- maybe a cost analysis benchmark would be a nice accompaniment.

This kind of thing keeps popping up each time a new model is released and I don't think people are aware that token efficiency can change.

tedsanders 6 hours ago | parent [-]

Agreed. Would be great if everyone starts reporting cost per task alongside eval scores, especially in a world where you can spend arbitrary test-time compute. This is one thing I like about the Artificial Analysis website - they include cost to run alongside their eval scores: https://artificialanalysis.ai/