Remix.run Logo
ACCount37 an hour ago

If the easiest pathway to high performance next token prediction lies through reasoning, then training for better next token prediction ends up training for reasoning implicitly.

By now, there's every reason to believe that this is what's happening in LLMs.

"Reasoning primitives" are learned in pre-training - and SFT and RL then assemble them into high performance reasoning chains, converting "reasoning as a side effect of next token prediction" to "reasoning as an explicit first class objective".

The end result is quite impressive. By now, it seems like the gap between human reasoning and LLM reasoning isn't "an entirely different thing altogether" - it's "humans still do it better at the very top end of the performance curve - when trained for the task and paying full attention".