| ▲ | grav 4 hours ago | |||||||
I fail to understand how two LLMs would be "consuming" a different amount of tokens given the same input? Does it refer to the number of output tokens? Or is it in the context of some "agentic loop" (eg Claude Code)? | ||||||||
| ▲ | lemonfever 4 hours ago | parent | next [-] | |||||||
Most LLMs output a whole bunch of tokens to help them reason through a problem, often called chain of thought, before giving the actual response. This has been shown to improve performance a lot but uses a lot of tokens | ||||||||
| ||||||||
| ▲ | jcims 4 hours ago | parent | prev | next [-] | |||||||
One very specific and limited example, when asked to build something 4.6 seems to do more web searches in the domain to gather latest best practices for various components/features before planning/implementing. | ||||||||
| ▲ | andrewchilds 4 hours ago | parent | prev | next [-] | |||||||
I've found that Opus 4.6 is happy to read a significant amount of the codebase in preparation to do something, whereas Opus 4.5 tends to be much more efficient and targeted about pulling in relevant context. | ||||||||
| ||||||||
| ▲ | Gracana 3 hours ago | parent | prev | next [-] | |||||||
They're talking about output consuming from the pool of tokens allowed by the subscription plan. | ||||||||
| ▲ | bsamuels 4 hours ago | parent | prev [-] | |||||||
thinking tokens, output tokens, etc. Being more clever about file reads/tool calling. | ||||||||