| ▲ | gmueckl 8 hours ago | |||||||||||||
What else is there to say? LLMs can at most regurgitate approximations of human reasoning steps in the limited forms in which they may be expressed in the training data or interpolations thereof. That's the core essence of what they are. There is no proper reasoning to be found. | ||||||||||||||
| ▲ | aspenmartin 8 hours ago | parent [-] | |||||||||||||
"at most" is wrong. RL with verifiable rewards takes you beyond quality and skills represented in training data, I'm not aware of meaningful fundamental limits here if you scale compute enough even though right now it's highly sample inefficient. Since you refuse to actually define what you consider to be reasoning let me at least put one out there: a system exhibits reasoning when an answer depends on nontrivial intermediate computation over the problem. If you find problems with this, fine, but just make an effort to contribute an alternative. If you increase test time compute you get better performance. If the model was just "interpolating" this wouldn't really work would it? Models can do FrontierMath expert problems (unpublished, expert authored, peer reviewed math problems) that require an insane amount of compositional reasoning. If they were regurgitating training data, that wouldn't really work would it? Chain of thought, while not always faithful to internal computation, improves performance. If the models were just regurgitating information, it wouldn't work that well would it? "regurgitating training data" is also of course misleading. Yea they can memorize parts of the training data, but they generalize very well. | ||||||||||||||
| ||||||||||||||