Remix clone Hacker News

new | show | ask | jobs Github

	▲	NitpickLawyer 5 days ago
		There's "swe re-bench", a benchmark that tracks model release dates, and you can see how the model did for "real-world" bugs that got submitted on github after the model was released. (obviously works best for open models). There are a few models that solve 30-50% of (new) tasks pulled from real-wolrd repos. So ... yeah.