The challenge I'm finding with sandboxes like this is evaluating them in comparison to each other.

This looks like a competent wrapper around sandbox-exec. I've seen a whole lot of similar wrappers emerging over the past few months.

What I really need is help figuring out which ones are trustworthy.

I think this needs to take the form of documentation combined with clearly explained and readable automated tests.

Most sandboxes - including sandbox-exec itself - are massively under-documented.

I am going to trust them I need both detailed documentation and proof that they work as advertised.

▲

e1g 8 hours ago | parent | next [-]

Thank you for your work - I have sent many of your links to my people.

Your point is totally fair for evaluating security tooling. A few notes -

1. I implemented this in Bash to avoid having an opaque binary in the way.

2. All sandbox-exec profiles are split up into individual files by specific agent/integration, and are easily auditable (https://github.com/eugene1g/agent-safehouse/tree/main/profil...)

3. There are E2E tests validating sandboxing behavior under real agents

4. You don't even need the Safehouse Bash wrapper, and can use the Policy Builder to generate a static policy file with minimal permissions that you can feed to sandbox-exec directly (https://agent-safehouse.dev/policy-builder). Or feed the repo to your LLMs and have them write your own policy from the many examples.

5. This whole repo should be a StrongDM-style readme to copy&paste to your clanker. I might just do that "refactor", but for now added LLM instructions to create your own sandbox-exec profiles https://agent-safehouse.dev/llm-instructions.txt

▲

big_toast 5 hours ago | parent [-]

I love this implementation. Do you find the SBPL deficient in any ways?

Would xcodebuild work in this context? Presumably I'd watch a log (or have an agent) and add permissions until it works?

	▲	e1g an hour ago \| parent [-]
		SBPL is great for filesystem controls and I haven’t hit roadblocks yet. I wish it offered more controls of outbound network requests (ie filtering by domain), but I understand why not. Yes, Safehouse should work for xcodebuild workloads in the way you described - try to run it, watch for failures, extend the profile, try again. Your agent can do this in a loop by itself - just feed it the repo as there are many integrations that are not enabled by default that will help it.

▲

kstenerud 3 hours ago | parent | prev | next [-]

If you're looking for one better documented and tested, you might like https://github.com/kstenerud/yoloai

	▲	okanesen 3 hours ago \| parent [-]
		I'm having trouble understanding what makes this: "better documented and tested"? Care to elaborate how the testing was done? What are the differences?

▲

vasco an hour ago | parent | prev [-]

So create a 'destroy my computer' test harness and run it whenever you test another wrapper. If it works you'll be fine. If it doesn't you buy a new computer.