| ▲ | kstenerud 9 days ago | |
"Design for containment at the environment layer first, then steer behavior at the model layer. " Umm... yeah? This is what I've been arguing for a long time now, and it's the primary reason why I wrote https://github.com/kstenerud/yoloai and use it as my daily-driver. I can't imagine running an agent without it. The environment layer is deterministic; the model layer is probabilistic. If your only defense is "the model is well-behaved" you've bet your crown jewels on a coin that happens to land heads most of the time. Also, "blast radius" isn't just one axis. You have: - Destruction radius: How many things INSIDE your workdir can get clobbered. - Collateral damage radius: How many things OUTSIDE your workdir can get clobbered. - Review radius: Are the changes gated on your review? Can you copy/diff/apply the changes the agent made to a copy INSIDE the container, to your real workdir OUTSIDE of the container? - Credential radius: How many credentials does your agent have access to? What bad things can it do with them? - Exfiltration radius: Network restrictions help here, but they don't guarantee that your secrets won't be exposed in a sneaky way. Don't expose the secrets to your agent to begin with. | ||