| ▲ | rudhdb773b 3 hours ago | |
Are there any up-to-date offline/private agentic coding benchmark leaderboards? If the tests haven't been published anywhere and are sufficiently different from standard problems, I would think the benchmarks would be robust to intentional over optimization. Edit: These look decent and generally match my expectations: | ||