Remix.run Logo
oofbaroomf a day ago

Mathematicians don't do high school math competitions - the benchmark in question is AIME.

Mathematicians generally do novel research, which is hard to optimize for easily. Things like LiveCodeBench (leetcode-style problems), AIME, and MATH (similar to AIME) are often chosen by companies so they can flex their model's capabilities, even if it doesn't perform nearly as well in things real mathematicians and real software engineers do.

j_maffe a day ago | parent [-]

Ok then you should clarify that you meant math benchmarks and not math capabilities.