Atari games are widely used in Reinforcement Learning (RL) research as a standard benchmark.

https://github.com/Farama-Foundation/Arcade-Learning-Environ...

The goal is to develop algorithms that generalize to other tasks.

They were highly used. OpenAI even included them in their RL Gym library back in the old days when they were still doing open research. But if you look at this leaderboard from 7 (yes, seven!) years ago [1], most of them were already solved way beyond human capabilities. But we didn't get a really useful general purpose algorithm out of it. As an AI researcher, I always considered Atari a fun academic exercise, but nothing more. Similar to how recognising characters using convnets was cool in the nineties and early 00s, but didn't give us general purpose image understanding. Only modern GPUs and massive training datasets did. Nowadays most cutting-edge RL game research focuses on much more advanced games like Minecraft which is thought to be better suited. But I'm pretty sure it's still not enough. Even role-playing GTA VI won't be. We probably need a pretty advanced physical simulation of the real world before we can get agents to handle the real world. But that means solving the problem of generating such an environment first, because you can't train on the actual real world due to the sample inefficiency of all current algorithms. Nvidia is doing some really interesting research in this direction by combining physics simulation and image generation models to simulate an environment, while getting accuracy and diversity at the same time into training data. But it still feels like some key ingredient is missing.

[1]https://github.com/cshenton/atari-leaderboard

▲

mschuster91 2 months ago | parent | next [-]

> But it still feels like some key ingredient is missing.

Continuous training is the key ingredient. Humans can use existing knowledge and apply it to new scenarios, and so can most AI. But AI cannot permanently remember the result of its actions in the real world, and so its body of knowledge cannot expand.

Take a toddler and an oven. The toddler has no concept of what an oven is other than maybe that it smells nice. The toddler will touch the oven, notice that it experiences pain (because the oven is hot) and learn that oven = danger. Place a current AI in a droid toddler body? It will never learn and keep touching the oven as soon as the information of "oven = danger" is out of the context window.

For some cases this inability to learn is actually desirable. You don't want anyone and everyone to be able to train ChatGPT unsupervised, otherwise you get 4chan flooding it with offensive crap like they did to Tay [1], but for AI that physically interacts with the meatspace, constant evaluation and learning is all but mandatory if it is to safely interact with its surroundings. "Dumb" robots run regular calibration cycles for their limbs to make sure they are still aligned to compensate for random deviations, and so will AI robots.

[1] https://en.wikipedia.org/wiki/Tay_(chatbot)

▲

sigmoid10 2 months ago | parent | next [-]

This kind of context management is not that hard, even when building LLMs. Especially when you have huge windows like we do today. Look at how ChatGPT can remember things permanently after you said them once using a function call to edit the permanent memory section inside the context. You can also see that in Anthropic's latest post on Claude 4 where it learns to play Pokemon. The only remaining issue here is maybe how to diffuse explicit knowledge from the stored context into the weights. Andrej Karpathy wrote a good piece on this recently. But personally I believe this might not even be necessary if you can manage your context well enough and see it more like RAM while the LLM is the CPU. For your example you can then always just fetch such information from a permanent storage like a VDB and load it into context once you enter an area in the real world.

▲

mr_toad 2 months ago | parent | next [-]

Big context windows are a poor substitute for updating the weights. Its like keeping a journal because your memory is failing.

	▲	fzzzy 2 months ago \| parent [-]
		It reminds me of the movie Memento.

▲

vectorisedkzk 2 months ago | parent | prev | next [-]

Having used vectorDBs before, we're very much not there yet. We don't have any appreciable amounts of context for any reasonable real-life memory. It works if that is the most recent thing you did. Have you talked to an LLM for a day? Stuff is gone before the first hour. You have to use every trick currently in the book, treat context like it's your precious pet

	▲	sigmoid10 2 months ago \| parent [-]
		VectorDBs are basically just one excuse of many to make up for a part of the system that is lacking capability due to technical limitations. I'm currently at 50:50 if the problems will be overcome directly by the models or by such support systems. Used to be 80:20 but models have grown in usefulness much faster than all the tools we built around them.

▲

2 months ago | parent | prev | next [-]

[deleted]

▲

mschuster91 2 months ago | parent | prev [-]

> This kind of context management is not that hard, even when building LLMs.

It is, at least if you wish to be in the meatspace, that's my point. Every day has 86400 seconds during which a human brain constantly adapts to and learns from external input - either directly as it's being awake or indirectly during nighttime cleanup processes.

On top of that, humans have built-in filters for training. Basically, we see some drunkard shouting about the Hollow Earth on the sidewalk... our brain knows that this is a drunkard and that Hollow Earth is absolutely crackpot material, so if it stores anything at all then the fact that there is a drunkard on that street and one might take another route next time, but the drunkard's rambling is forgotten maybe five minutes later.

AI, in contrast, needs to be hand-held by humans during training that annotate, "grade" or weigh information during the compilation of the training dataset, in order that the AI knows what is written in "Mein Kampf" so it can answer questions upon it, but that it also knows (or at least: won't openly regurgitate) that the solution to economic problems isn't to just deport Jews.

And huge context windows aren't the answer either. My wife says me, she would like to have a fruit cake for her next birthday. I'll probably remember that piece of information (or at the very least I'll write it down)... but an AI butler? I'd be really surprised if this is still in its context space in a year, and even if it is, I would not be surprised if it weren't able to recall that fact.

And the final thing is prompts... also not the answer. We've seen it just a few days ago with Grok - someone messed with the system prompt so it randomly interjected "white genocide" claims into completely unrelated conversation [1] despite hopefully being trained on a ... more civilised dataset, and to the contrary, we've also seen Grok reply to Twitter questions in a way that suggest that it is aware its training data is biased.

[1] https://www.reuters.com/business/musks-xai-updates-grok-chat...

▲

sigmoid10 2 months ago | parent [-]

>Every day has 86400 seconds during which a human brain constantly adapts to and learns from external

That's not even remotely true. At least not in the sense that it is for context in transformer models. Or can you tell me all the visual and auditory inputs you experienced yesterday at the 45232nd second? You only learn permanently and effectively from particular stimulation coupled with surprise. That has a sample rate which is orders of magnitude lower. And it's exactly the kind of sampling that can be replicated with a run-of-the-mill persistent memory system for an LLM. I would wager that you could fit most people's core experiences and memories that they can randomly access at any moment into a 1000 page book - something that fits well into state of the art context windows. For deeper more detailed things you can always fall back to another system.

▲

bluesroo 2 months ago | parent | next [-]

Your definition of "learning" is incomplete because you're applying LLM concepts to how human brains work. An LLM only "learns" during training. From that point forward all it has is its context and vector DBs. If an LLM and vector DB is not actively interacted with, nothing happens to it. However for the brain, experiencing IS learning. And the brain NEVER stops experiencing.

Just because I don't remember my experiences at second 45232 on May 22, doesn't mean that my brain was not actively adapting to my experiences at that moment. The brain does a lot more learning than just what is conscious. And then when I went to sleep the brain continued pruning and organizing my unconscious learning for the day.

Seeing if someone can go from token to freeform physical usefulness will be interesting. I'm of the belief that LLMs are too verbose and energy intensive to go from language regurgitation machines to moving in the real world according to free form prompting. It may be accomplishable with the vast amount of hype investment, but I think the energy requirements and latency will make an LLM-based approach economically infeasible.

	▲	sigmoid10 2 months ago \| parent [-]
		> I'm of the belief that LLMs are too verbose and energy intensive to go from language regurgitation machines to moving in the real world according to free form prompting. This is not just possible, it is already happening. It just gets drowned in the media noise about chatbots. Look at some current research in this area (e.g. by Nvidia last year).

▲

ewoodrich 2 months ago | parent | prev [-]

> You only learn permanently and effectively from particular stimulation coupled with surprise.

This is just, not true. A single 2min conversation with emotional or intellectual resonance can significantly alter a human’s thought process for years. There are some topics where every time they come up directly or analogously I can recall something a teacher told me in high school that “stuck” with me for whatever reason. And it isn’t even a “core” experience, just something that instantly clicked for my brain and altered my problem solving. At the time, there’s no heuristic that could predict how or why that particular interaction should have that kind of staying power.

Not to mention, experiences that subtly alter thinking or behavior just by virtue of providing some baseline familiarity instead of blank slate problem solving or routine. Like how you subtly adjust how you interact with coworkers based on the culture of your current company over time vs the last without any “flash” of insight required.

	▲	sigmoid10 2 months ago \| parent \| next [-]
		You are just rephrasing things without using terms commonly used in research. I used "surprise" because it is a) true (if not complete) and b) easy to understand for people outside of the field. The correct term you are looking for is "arousal" (not necessarily sexual). There is tons of research on the fact that arousal enhances memory formation that would otherwise need many, many repetitions. But it also inhibits memorisation of nearby events. So you either need a very particular emotional state to remember a specific thing or massive repetitions to remember many new things. There's no way to cheat the sample inefficiency of your own brain. And for LLMs we have only figured out the first one, at least without external algorithms. [1] https://www.nature.com/articles/nrn1052 [2] https://pubmed.ncbi.nlm.nih.gov/26151918/
	▲	hnaccount_rng 2 months ago \| parent \| prev [-]
		But that resonance is a form of surprise. You are just using different words for the same context. At the same time: You are correct in the sense, that this "surprise" is completely ignored by today's LLMs. They only use this in training mode and not for continuous learning. Whether one can find a sufficiently useful definition of "surprise" to use auxiliary "learning systems" (vector DBs or system prompts) has yet to be shown

▲

losvedir 2 months ago | parent | prev | next [-]

> Continuous training is the key ingredient. Humans can use existing knowledge and apply it to new scenarios, and so can most AI. But AI cannot permanently remember the result of its actions in the real world, and so its body of knowledge cannot expand.

I think it depends on how you look at it. I don't want to torture the analogy too much, but I see the pre-training (getting model weights out of an enormous corpus of text) as more akin to the billions of years of evolution that led to the modern human brain. The brain still has a lot to learn once you're born, but it already also has lots of structures (e.g. to handle visual input, language, etc) and built-in knowledge (instincts). And you can't change that over the course of your life.

I wouldn't be surprised if we ended up in a "pre-train / RAG / context window" architecture of AI, analogously to "evolution / long term memory / short term memory" in humans.

▲

epolanski 2 months ago | parent | prev | next [-]

> Humans can use existing knowledge and apply it to new scenarios, and so can most AI

Doesn't the article states that this is not true? AI cannot apply to B what it learned about A.

▲

mschuster91 2 months ago | parent [-]

Well, ChatGPT knows about the 90s Balkan wars, a topic to which LWT hasn't made an episode that I'm aware of, and yet I can ask it to write a script for a Balkan wars episode that reads surprisingly like John Oliver while being reasonably correct.

	▲	epolanski 2 months ago \| parent [-]
		Essentially Carmack pointed in the slides that teaching AI to play game a, b or c didn't improve AI at all at learning game d from scratch. That's essentially what we're looking for when we talk about general intelligence, the capability to adapting what we know to what we know nothing about.

▲

aatd86 2 months ago | parent | prev [-]

it's more than that. Our understanding from space and time could be stemming from continuous training. Every time we look at something, there seems to be a background process that is categorizing items that are on the retinal image.

This is a continuous process.

▲

gregdeon 2 months ago | parent | prev | next [-]

I watched the talk live. I felt that his main argument was that Atari _looks_ solved, but there's still plenty of value that could be gained by revisiting these "solved" games. For one, learning how to play games through a physical interface is a way to start engaging with the kinds of problems that make robotics hard (e.g., latency). They're also a good environment to study catastrophic forgetting: an hour of training on one game shouldn't erase a model's ability to play other games.

I think we could eventually saturate Atari, but for now it looks like it's still a good source of problems that are just out of reach of current methods.

▲

koolala 2 months ago | parent [-]

Is a highly specialized bespoke robot for a Atari controller really that different? If anyone cared about latency they could have added it to the emulated controls and video with random noise.

	▲	gregdeon 2 months ago \| parent [-]
		I think it is. Latency was just one of the problems he described. A physical controller sometimes adds "phantom inputs" as the joystick transitions between two inputs. Physical actuators also slow down with wear. A physical Atari-playing robot needs to learn qualitatively different strategies that are somewhat more robust to these problems. Emulators also let the bot take as much time as it needs between frames, which is much easier than playing in real time. To me, all of this makes a physical robot seem like a decent way to start engaging with problems that come up in robotics but not simulated games.

▲

Buttons840 2 months ago | parent | prev | next [-]

My impression is that Atari was 80% solved, and then researchers and companies moved on.

A company solves self-driving 80% of the way and makes a lot of VC cash along the way. Then they solve intelligent chatbots 80% of the way and make a lot of VC cash along the way. Now they're working on solving humanoid robotics 80% of the way... I wonder why?

In the end, we have technology that can do some neat tricks, but can't be relied upon.

There are probably still some very hard problems in certain Atari games. Only the brave dare tackle these problems, because failure comes sharp and fast. Whereas, throwing more compute at a bigger LLM might not really accomplish anything, but we can make people think it accomplished something, and thus failure is not really possible.

▲

newsclues 2 months ago | parent | prev [-]

Being highly used in the past is good, it's a benchmark to compare against.