| ▲ | jddj 2 hours ago | |
For the most part I think we get the benchmarks we deserve. Many SWE-bench passing PRs would not be merged: https://news.ycombinator.com/item?id=47341645 Top model SWE bench scores may be skewed by git history leaks: https://news.ycombinator.com/item?id=45214670 | ||