Remix.run Logo
rudhdb773b 3 hours ago

Are there any up-to-date offline/private agentic coding benchmark leaderboards?

If the tests haven't been published anywhere and are sufficiently different from standard problems, I would think the benchmarks would be robust to intentional over optimization.

Edit: These look decent and generally match my expectations:

https://www.apex-testing.org/