▲ | oofbaroomf a day ago | ||||||||||||||||
Nice to see that Sonnet performs worse than o3 on AIME but better on SWE-Bench. Often, it's easy to optimize math capabilities with RL but much harder to crack software engineering. Good to see what Anthropic is focusing on. | |||||||||||||||||
▲ | j_maffe a day ago | parent [-] | ||||||||||||||||
That's a very contentious opinion you're stating there. I'd say LLMs have surpassed a larger percentage of SWEs in capability than they have for mathematicians. | |||||||||||||||||
|