Remix.run Logo
j_maffe a day ago

That's a very contentious opinion you're stating there. I'd say LLMs have surpassed a larger percentage of SWEs in capability than they have for mathematicians.

oofbaroomf a day ago | parent [-]

Mathematicians don't do high school math competitions - the benchmark in question is AIME.

Mathematicians generally do novel research, which is hard to optimize for easily. Things like LiveCodeBench (leetcode-style problems), AIME, and MATH (similar to AIME) are often chosen by companies so they can flex their model's capabilities, even if it doesn't perform nearly as well in things real mathematicians and real software engineers do.

j_maffe a day ago | parent [-]

Ok then you should clarify that you meant math benchmarks and not math capabilities.