| ▲ | lazarus01 3 days ago | ||||||||||||||||
It isn't thinking it's RL with reward hacking. It's like taking a student who wins a gold in IMO math, but can't solve easier math problems, because they did not study those type of problems. Where a human who is good at IMO math generalizes to all math problems. It's just memorizing a trajectory as part of a specific goal. That's what RL is. | |||||||||||||||||
| ▲ | CamperBob2 3 days ago | parent [-] | ||||||||||||||||
It's like taking a student who wins a gold in IMO math, but can't solve easier math problems I've tried to think of specific follow-up questions that will help me understand your point of view, but other than "Cite some examples of easier problems than a successful IMO-level model will fail at," I've got nothing. Overfitting is always a risk, but if you can overfit to problems you haven't seen before, that's the fault of the test administrators for reusing old problem forms or otherwise not including enough variety. GPT itself suggests[1] that problems involving heavy arithmetic would qualify, and I can see that being the case if the model isn't allowed to use tools. However, arithmetic doesn't require much in the way of reasoning, and in any case the best reasoning models are now quite decent at unaided arithmetic. Same for the tried-and-true 'strawberry' example GPT cites, involving introspection of its own tokens. Reasoning models are much better at that than base models. Unit conversions were another weakness in the past that no longer seems to crop up much. So what would some present-day examples be, where models that can perform complex CoT tasks fail on simpler ones in ways that reveal that they aren't really "thinking?" 1: https://chatgpt.com/share/695be256-6024-800b-bbde-fd1a44f281... | |||||||||||||||||
| |||||||||||||||||