▲ | jules 6 days ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
What does this predict about LLMs ability to win gold at the International Mathematical Olympiad? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | measurablefunc 6 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Same thing it does about their ability to drive cars. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | godelski 6 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Depends which question you're asking. Ability to win a gold medal as if they were scored similarly to how humans are scored? or Ability to win a gold medal as determined by getting the "correct answer" to all the questions? These are subtly two very different questions. In these kinds of math exams how you get to the answer matters more than the answer itself. i.e. You could not get high marks through divination. To add some clarity, the latter would be like testing someone's ability to code by only looking at their results to some test functions (oh wait... that's how we evaluate LLMs...). It's a good signal but it is far from a complete answer. It very much matters how the code generates the answer. Certainly you wouldn't accept code if it does a bunch of random computations before divining an answer. The paper's answer to your question (assuming scored similarly to humans) is "Don’t count on it". Not a definitive "no" but they strongly suspect not. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|