Remix.run Logo
aspenmartin 8 hours ago

"at most" is wrong. RL with verifiable rewards takes you beyond quality and skills represented in training data, I'm not aware of meaningful fundamental limits here if you scale compute enough even though right now it's highly sample inefficient.

Since you refuse to actually define what you consider to be reasoning let me at least put one out there: a system exhibits reasoning when an answer depends on nontrivial intermediate computation over the problem. If you find problems with this, fine, but just make an effort to contribute an alternative.

If you increase test time compute you get better performance. If the model was just "interpolating" this wouldn't really work would it? Models can do FrontierMath expert problems (unpublished, expert authored, peer reviewed math problems) that require an insane amount of compositional reasoning. If they were regurgitating training data, that wouldn't really work would it? Chain of thought, while not always faithful to internal computation, improves performance. If the models were just regurgitating information, it wouldn't work that well would it?

"regurgitating training data" is also of course misleading. Yea they can memorize parts of the training data, but they generalize very well.

applicative 3 hours ago | parent | next [-]

There is the obvious limit that human text output is limited. To this you can add the specific testable training that pertains to code, but this degrades the weights for more general communication. Somehow the hype over the successes with coding in the last year or so made everyone forget the intrinsic limit posed by the exhaustion of real human text output, which is absolutely inescapable

7 hours ago | parent | prev [-]
[deleted]