▲ | oofbaroomf a day ago | |
Interesting how Sonnet has a higher SWE-bench Verified score than Opus. Maybe says something about scaling laws. | ||
▲ | somebodythere a day ago | parent | next [-] | |
My guess is that they did RLVR post-training for SWE tasks, and a smaller model can undergo more RL steps for the same amount of computation. | ||
▲ | benoittravers a day ago | parent | prev [-] | |
Do you have the link to that benchmark? Can’t see where Sonnet is highlighted. |