Remix.run Logo
ACCount37 3 days ago

With better reasoning training, the models mitigate more and more of that entirely by themselves. They "diverge into a ditch" less, and "converge towards the right answer" more. They are able to use more and more test-time compute effectively. They bring their own supply of "wait".

OpenAI's in-house reasoning training is probably best in class, but even lesser naive implementations go a long way.