> AI benchmarks suck.
Not only do they suck, but it's an essentially an impossible task since there is no frame of reference on what "good code" looks like.