Remix.run Logo
o10449366 7 hours ago

I feel like this benchmark reiterates my disbelief that anyone uses the latest Anthropic models for any productive work. They seem to be the best at burning tokens and spawning unnecessary subagents even for well-defined and tightly scoped tasks.

Can we get a count of people that have had Claude read irrelevant documents or perform unnecessary web searches even when told not to from the beginning?

I'm starting to wonder if this increased token usage is inadvertently bleeding into how Anthropic actually trains their model, especially leading up to IPO. As older models are deprecated and users are forced onto newer models, if the default is less efficient and more token expensive that directly results in higher "profit" for Anthropic in terms of the consumption their users have to tolerate - lest they jump to a competitor.

cbg0 3 hours ago | parent | next [-]

I've had no problems like the ones you've mentioned while using Opus 4.8. It does overthink stuff with higher effort levels but that's kind of expected.

mwigdahl 11 minutes ago | parent [-]

Same (including the overthinking issue).

mrngld 2 hours ago | parent | prev | next [-]

Now that enterprise customers are pay-as-you-go with tokens I suspect we'll see renewed interest in OpenAI and their focus on token efficiency. At least I hope so if the alternative is abandoning the tools entirely.

pbowyer 7 hours ago | parent | prev | next [-]

> I feel like this benchmark reiterates my disbelief that anyone uses the latest Anthropic models for any productive work. They seem to be the best at burning tokens and spawning unnecessary subagents even for well-defined and tightly scoped tasks.

I keep Claude around for some specific tasks:

- Linked up to Figma MCP to implement front-end stuff

- Data analysis, in the "Connect AI to a data source and ask questions" way. I've tried both Opus 4.8 high and GPT 5.5 high for this and Opus is stronger because it gets the intent in the question better

I used to keep it around for planning too, but the 4.8 plans have had more holes than swiss cheese.

anon373839 6 hours ago | parent | prev [-]

> I'm starting to wonder if this increased token usage is inadvertently bleeding into how Anthropic actually trains their model

Related: Sonnet 5’s new tokenizer increases token usage by 30%. (https://simonwillison.net/2026/Jun/30/claude-sonnet-5/)