| ▲ | Lerc 2 hours ago | |
Perhaps you should check your books again, reasoning and reinforcement learning are different things. Some reasoning is trained by reinforecement you could just finetune reasoning, people have had better results than you would expect by brute forcing inserting tokens to periodically say "wait, let me think." Reinforcement trains things to produce better results, not move towards a specific correct result. There is no future answer to predict. | ||