I fail to understand how two LLMs would be "consuming" a different amount of tokens given the same input? Does it refer to the number of output tokens? Or is it in the context of some "agentic loop" (eg Claude Code)?

▲

lemonfever 4 hours ago | parent | next [-]

Most LLMs output a whole bunch of tokens to help them reason through a problem, often called chain of thought, before giving the actual response. This has been shown to improve performance a lot but uses a lot of tokens

	▲	zozbot234 3 hours ago \| parent [-]
		Yup, they all need to do this in case you're asking them a really hard question like: "I really need to get my car washed, the car wash place is only 50 meters away, should I drive there or walk?"

▲

jcims 4 hours ago | parent | prev | next [-]

One very specific and limited example, when asked to build something 4.6 seems to do more web searches in the domain to gather latest best practices for various components/features before planning/implementing.

▲

andrewchilds 4 hours ago | parent | prev | next [-]

I've found that Opus 4.6 is happy to read a significant amount of the codebase in preparation to do something, whereas Opus 4.5 tends to be much more efficient and targeted about pulling in relevant context.

	▲	OtomotO 4 hours ago \| parent [-]
		And way faster too!

▲

Gracana 3 hours ago | parent | prev | next [-]

They're talking about output consuming from the pool of tokens allowed by the subscription plan.

▲

bsamuels 4 hours ago | parent | prev [-]

thinking tokens, output tokens, etc. Being more clever about file reads/tool calling.