| ▲ | sosodev 7 hours ago | ||||||||||||||||||||||
I don’t know why you’re getting downvoted. It’s true. Averaged across a wide variety of benchmarks Fable is the only Anthropic model that performs better than GPT 5.5 xhigh. | |||||||||||||||||||||||
| ▲ | Eridrus 7 hours ago | parent [-] | ||||||||||||||||||||||
The problem is that there are a bunch of benchmarks, the model providers often don't even use the same benchmarks, a bunch of them have known problems, and it's expensive to do your own benchmarks. I am a GPT 5.x booster since to me it just feels smarter, and I generally felt like the benchmarks backed me up, but it's not every benchmark, so sadly we're mostly arguing about vibes. SWEBench-Pro was a big one, though apparently Claude was reading solutions out of the .git folder it wasn't meant to have access to among other problems. | |||||||||||||||||||||||
| |||||||||||||||||||||||