Came here to post this as well, and it's interesting to see how benchmarks don't always track feelings. Which is one of the things people say in favor of Anthropic Models!