| ▲ | ai-tamer 4 days ago | |||||||||||||
Same. The numbers match your feel. Going from 4.6 to 4.7: +14.6 on MCP-Atlas, +10.9 on SWE-bench Pro, tool errors cut by two-thirds. But BrowseComp dropped 4.7 points. Anthropic's own announcement says 4.7 "takes the instructions literally" where 4.6 interpreted them loosely, and recommends re-tuning prompts accordingly. In a conversational loop with an opinionated developer, that translates to less quality because less reasoning — the model executes instead of thinking through. https://llm-stats.com/blog/research/claude-opus-4-7-vs-opus-... https://www.anthropic.com/news/claude-opus-4-7 | ||||||||||||||
| ▲ | siva7 4 days ago | parent [-] | |||||||||||||
So it became gpt 5.4 xhigh but ten times the cost? | ||||||||||||||
| ||||||||||||||