Remix.run Logo
2001zhaozhao 3 hours ago

You know that it's a honest benchmark when their own model (SWE-1.6) scores terrible on it.