| ▲ | Jimmc414 8 hours ago | |
Goodhart’s Law in reverse, what can’t be gamed gets rejected. | ||
| ▲ | stephen_cagle 5 hours ago | parent | next [-] | |
You've almost buffer overrun Goodhart's Law into the https://en.wikipedia.org/wiki/McNamara_fallacy . :] | ||
| ▲ | cbg0 6 hours ago | parent | prev [-] | |
SWE-bench verified was created in collaboration with OpenAI. It's also an open dataset so prone to contamination, meaning it can be gamed. | ||