| ▲ | insane_dreamer 2 hours ago | |
It's not data leakage. If the experiment is to see how the AI behaves on its own, then of course it needs to know the outcomes of its decisions (either automatically, or fed to it by a human), which of course influence its next decisions. This is providing the AI with retained memory, which is essential to the experiment. It's similar to an AI writing code which it then runs and parses the logs to see the outcome and make improvements to it. (It is not _retrained_ on those outcomes, and neither is that the case here; but it can reference them in stored memory.) | ||
| ▲ | bfeynman an hour ago | parent [-] | |
How is it not analogous to data leakage? The claim is that the system works autonomously, or at minimum could, but there is effectively signal via human in the loop feedback. That's leakage into test time evaluation. Also the coding analogy is malappropriated, in that the llm is using its own signals autonomously in the environment. Using a kalman filter on a ICBM with its own sensors is analogous to the coding agent and is autonomous. A system where a human is course correcting based on signals/sensor data is what's presented here, that is not autonomous. | ||