Remix.run Logo
grav 4 hours ago

I fail to understand how two LLMs would be "consuming" a different amount of tokens given the same input? Does it refer to the number of output tokens? Or is it in the context of some "agentic loop" (eg Claude Code)?

lemonfever 4 hours ago | parent | next [-]

Most LLMs output a whole bunch of tokens to help them reason through a problem, often called chain of thought, before giving the actual response. This has been shown to improve performance a lot but uses a lot of tokens

zozbot234 3 hours ago | parent [-]

Yup, they all need to do this in case you're asking them a really hard question like: "I really need to get my car washed, the car wash place is only 50 meters away, should I drive there or walk?"

jcims 4 hours ago | parent | prev | next [-]

One very specific and limited example, when asked to build something 4.6 seems to do more web searches in the domain to gather latest best practices for various components/features before planning/implementing.

andrewchilds 4 hours ago | parent | prev | next [-]

I've found that Opus 4.6 is happy to read a significant amount of the codebase in preparation to do something, whereas Opus 4.5 tends to be much more efficient and targeted about pulling in relevant context.

OtomotO 4 hours ago | parent [-]

And way faster too!

Gracana 3 hours ago | parent | prev | next [-]

They're talking about output consuming from the pool of tokens allowed by the subscription plan.

bsamuels 4 hours ago | parent | prev [-]

thinking tokens, output tokens, etc. Being more clever about file reads/tool calling.