▲ | Animats 4 days ago | |
The "Three Laws of Self-Evolving AI Agents" suffer from not being checkable except in retrospect. I Endure (Safety Adaptation) Self-evolving AI agents must maintain safety and stability during any modification. II. Excel (Performance Preservation) Subject to the First law, self-evolving AI agents must preserve or enhance existing task performance. So, if some change is proposed for the system, when does it commit? Some kind of regression testing is needed. The designs sketched out in Figure 3 suggest applying changes immediately, and relying on later feedback to correct degradation. That may not be enough to ensure sanity. In a code sense, it's like making changes directly on trunk, and fixing them on trunk if something breaks. The usual procedure today is to work on a branch or branches and merge to trunk only when you have some accumulated successful experience that the branch is an improvement. Self-evolving AI agents may need a back-out procedure like that. Maybe even something like "blame". |