>establish benchmarks that make sense and are reliable
How aren't current LLM coding benchmarks reliable?
They're manipulated.
Unless you are going to be more specific, that criticism applies to all benchmarks that are connected to a positive gain, not just AI coding benchmarks.