Remix.run Logo
nine_k 6 hours ago

LLM-written code passed SWE Bench even back then. This may just say that SWE Bench is an inadequate test, and should not be used for serious evaluation.