In my opinion most Anthropic models are the opposite, scoring well on benchmarks but not always way on top, but quietly excellent when you actually try to use them for stuff.