Remix.run Logo
lostmsu 6 hours ago

I'm with OpenAI folks on this one: Atari just won't cut it for AGI. My layman intuition is that RL works well when rewards give good signal all the time. Until it does RL is basically random search. That's where massive data diversity like we have in text comes in handy.

In a game there might be a level with a door and a key, and because there's no reward for getting the key closer to the door, bridging this gap requires random search in a massive state space. But in the vast sea of scenarios that you can find in Common Crawl there's probably one, where you are 1 step from the key, and the key is 1 step from the door, so you get the reward signal from it without having to search an enormous state space.

You might say "but you have to search through the giant Common Crawl". Well yes, but while doing so you will get reward signal not just for the key and door problem, but for nearly every problem in the world.

The point is: pretraining teaches models to extract signal that can be used to explore solutions to hard search problems, and if you don't do that you are wasting your time enumerating giant state spaces.

lostmsu 6 hours ago | parent [-]

You can actually easily test and overcome this by training a model simultaneously on a massive of text and Atari while carefully balancing learning rates between the two.