| ▲ | simonw 8 hours ago | ||||||||||||||||
The challenge I'm finding with sandboxes like this is evaluating them in comparison to each other. This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months. What I really need is help figuring out which ones are trustworthy. I think this needs to take the form of documentation combined with clearly explained and readable automated tests. Most sandboxes - including sandbox-exec itself - are massively under-documented. I am going to trust them I need both detailed documentation and proof that they work as advertised. | |||||||||||||||||
| ▲ | e1g 8 hours ago | parent | next [-] | ||||||||||||||||
Thank you for your work - I have sent many of your links to my people. Your point is totally fair for evaluating security tooling. A few notes - 1. I implemented this in Bash to avoid having an opaque binary in the way. 2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...) 3. There are E2E tests validating sandboxing behavior under real agents 4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples. 5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt | |||||||||||||||||
| |||||||||||||||||
| ▲ | kstenerud 3 hours ago | parent | prev | next [-] | ||||||||||||||||
If you're looking for one better documented and tested, you might like https://github.com/kstenerud/yoloai | |||||||||||||||||
| |||||||||||||||||
| ▲ | vasco an hour ago | parent | prev [-] | ||||||||||||||||
So create a 'destroy my computer' test harness and run it whenever you test another wrapper. If it works you'll be fine. If it doesn't you buy a new computer. | |||||||||||||||||