They are, in benchmarks. In practice Anthropic's models are ahead of where their benchmarks suggest.
Bear in mind that lead may be, in large part, from the tooling rather than the model