| ▲ | overgard 3 hours ago | |
> Specifically on the topic of next token prediction, we are already past that phase. We really aren't past that phase at all. Reasoning models are just next-token prediction trained in a way where it thinks out loud, essentially. (Source: books on how LLMs actually work, and asking ChatGPT directly!) Harnesses and tool use help a little bit, but it doesn't change fundamentally what an LLM is. | ||
| ▲ | Lerc an hour ago | parent [-] | |
Perhaps you should check your books again, reasoning and reinforcement learning are different things. Some reasoning is trained by reinforecement you could just finetune reasoning, people have had better results than you would expect by brute forcing inserting tokens to periodically say "wait, let me think." Reinforcement trains things to produce better results, not move towards a specific correct result. There is no future answer to predict. | ||