Remix.run Logo
stolencode 4 days ago

> For example achieving 66.7% on the AIME 2024 dataset.

We worked _really_ hard, burned _tons_ of cash, and we're proud of our D- output. No wonder there are more papers published than actual work being done.

supermdguy 4 days ago | parent | next [-]

That corresponds to a 10/15, which is actually really good (median is around 6)

https://artofproblemsolving.com/wiki/index.php/AMC_historica...

stolencode 4 days ago | parent [-]

Isn't the test taken only by students under the age of 12?

Meanwhile the model is trained on these specific types of problems, does not have an apparent time or resource limit, and does not have to take the test in a proctored environment.

It's D- work. Compared to a 12 year old, okay, maybe it's B+. Is this really the point you wanted to make?

jpcompartir 4 days ago | parent | prev [-]

This is a nonsense critique.

Modest results are worth publishing, as are bad results.