Remix.run Logo
aezart 3 hours ago

Regarding the interactions shown in the screenshots:

LLMs are pattern-matching machines. They keep the pattern going. Once "the agent disobeys the human's instructions" has made its way into the context, that is the pattern that it's going to keep matching. No amount of telling it to stop will make it stop.

The only possible solution is excising it from context and replacing it with examples of it doing the right thing. Given that these models have massive context windows now and much of the output is hidden from the user, that's becoming less viable.