▲ | HarHarVeryFunny 6 days ago | |
Surely we do know why - reinforcement learning for reasoning. These systems are trained to generate reasoning steps that led to verified correct conclusions during training. No guarantees how they'll perform on different problems of course, but in relatively narrow closed domains like math and programming, it doesn't seem surprising that when done at scale there are similar enough problems where similar reasoning logic will apply, and it will be successful. | ||
▲ | dcre 2 days ago | parent [-] | |
We don't know why that is sufficient to enable the models to develop the capability, and we don't know what they are actually doing under the hood when they employ the capability. |