Remix.run Logo
andix 4 hours ago

Like I said: sandboxing doesn't solve the problem.

As long as the agent creates more than just text, it can leak data. If it can access the internet in any manner, it can leak data.

The models are extremely creative and good at figuring out stuff, even circumventing safety measures that are not fully air tight. Most of the time they catch the deception, but in some very well crafted exploits they don't.