It does, but the limit isn't "human performance". AI isn't bounded by human performance. The limit is the saturation of the benchmark in question.
Which is solvable with better benchmarks.