| ▲ | gregsadetsky 4 hours ago | |||||||
You're correct of course - LLMs may get better at any task of course, but I meant that publishing the evals might (optimistically speaking) help LLMs get better at the task. If the eval was actually picked up / used in the training loop, of course. | ||||||||
| ▲ | adastra22 4 hours ago | parent [-] | |||||||
That kind of “get better at” doesn’t generalize. It will regurgitate its training data, which now includes the exact answer being looked for. It will get better at answering that exact problem. But if you care about its fundamental reasoning and capability to solve new problems, or even just new instances of the same problem, then it is not obvious that publishing will improve this latter metric. Problem solving ability is largely not from the pretraining data. | ||||||||
| ||||||||