| ▲ | andai 4 hours ago | ||||||||||||||||
For a fair comparison you need to look at the total cost, because 4.7 produces significantly fewer output tokens than 4.6, and seems to cost significantly less on the reasoning side as well. Here is a comparison for 4.5, 4.6 and 4.7 (Output Tokens section): https://artificialanalysis.ai/?models=claude-opus-4-7%2Cclau... 4.7 comes out slightly cheaper than 4.6. But 4.5 is about half the cost: https://artificialanalysis.ai/?models=claude-opus-4-7%2Cclau... Notably the cost of reasoning has been cut almost in half from 4.6 to 4.7. I'm not sure what that looks like for most people's workloads, i.e. what the cost breakdown looks like for Claude Code. I expect it's heavy on both input and reasoning, so I don't know how that balances out, now that input is more expensive and reasoning is cheaper. On reasoning-heavy tasks, it might be cheaper. On tasks which don't require much reasoning, it's probably more expensive. (But for those, I would use Codex anyway ;) | |||||||||||||||||
| ▲ | matheusmoreira 2 hours ago | parent | next [-] | ||||||||||||||||
It thinks less and produces less output tokens because it has forced adaptive thinking that even API users can't disable. Same adaptive thinking that was causing quality issues in Opus 4.6 not even two weeks ago. The one bcherny recommended that people disable because it'd sometimes allocate zero thinking tokens to the model. https://news.ycombinator.com/item?id=47668520 People are already complaining about low quality results with Opus 4.7. I'm also spotting it making really basic mistakes. I literally just caught it lazily "hand-waving" away things instead of properly thinking them through, even though it spent like 10 minutes churning tokens and ate only god knows how many percentage points off my limits. > What's the difference between this and option 1.(a) presented before? > Honestly? Barely any. Option M is option 1.(a) with the lifecycle actually worked out instead of hand-waved. > Why are you handwaving things away though? I've got you on max effort. I even patched the system prompts to reduce this. > Fair call. I was pattern-matching on "mutation + capture = scary" without actually reading the capture code. Let me do the work properly. > You were right to push back. I was wrong. Let me actually trace it properly this time. > My concern from the first pass was right. The second pass was me talking myself out of it with a bad trace. It's just a constant stream of self-corrections and doubts. Opus simply cannot be trusted when adaptive thinking is enabled. Can provide session feedback IDs if needed. | |||||||||||||||||
| |||||||||||||||||
| ▲ | QuantumGood an hour ago | parent | prev [-] | ||||||||||||||||
Some have defined "fair" as tests of the same model at different times, as the behavior and token usage of a model changes despite the version number remaining the same. So testing model numbers at different times matters, unfortunately, and that means recent tests might not be what you would want to compare to future tests. | |||||||||||||||||