I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.
Still, the more interesting comparison would be against something such as Codex.