Treat your coding agents like developers

Author here. Three months ago I posted a Show HN for yolobox [1] - a sandbox for running AI coding agents without them being able to nuke your home directory.

Since then I've been using it almost every day, which eventually meant wanting more than one agent running against the same project at the same time. This post is what I learned trying to make that work without it being a constant disaster.

The short version: git worktrees are the right Git abstraction and the wrong abstraction for this problem. The unit you want to fork is the developer, not the branch - full folder copy, its own Compose project, its own URL. yolobox now ships a fork subcommand that does this.

Happy to answer questions.

[1] https://news.ycombinator.com/item?id=46592344

▲

tracker1 a day ago | parent | prev | next [-]

I think I'd go a slightly different route, if I was trying to do this, and that would be to give each agent at least a VM. Not to mention an email account, so that they can coordinate/collaborate with the other "developers" ...

In the end, I firmly believe that agents need a lot more guidance in terms of direction than what a lot of people seem to be giving. Let alone code reviews.

▲

Finbarr a day ago | parent [-]

VMs bring greater isolation but they're a lot heavier and slower. The agents just use github for synchronization here, though I've been considering building some kind of todo list overlay locally.

▲

tracker1 a day ago | parent [-]

Yes... but with full VMs, you can integrate docker (compose) into the application workflows without risking conflicts between separate agents on the same system/vm.

▲

Finbarr a day ago | parent [-]

Did you read the post? That's exactly the problem I just solved.

▲

CodesInChaos 16 hours ago | parent [-]

That's the part of your post I have trouble understanding. That you need to work around colliding ports suggests that the containers spun up by the agent run directly on the host, not inside some form of nested containerization. But if you do that, how do you ensure that the application running in those containers is sandboxed just as strictly as the agent itself?

▲

Finbarr 16 hours ago | parent [-]

The docker compose stack for the applications is spun up on the host. The agents have access to the docker socket which means they can talk to docker from inside their sandbox and spin up new sibling containers on the host. Yolobox isn’t designed for full isolation- just accidental commands you wouldn’t want to run on the host, and a convenient way of giving agents a customizable environment they control.

Early on in development I tried to harden the container to prevent deliberate escapes by the agent. This was a waste of time as the agents just kept finding more and more exploits when I asked them to try and break out.

▲

CodesInChaos 15 hours ago | parent | next [-]

So the right way to use yolobox is to spin up one VM as a secure sandbox, and then use yolobox to separate individual agents within the VM?

	▲	Finbarr 15 hours ago \| parent [-]
		I wouldn't assume that a VM will give you complete security against a determined AI. yolobox started as a way to prevent accidental `rm -rf ~` and has expanded into a set of tools that make working with CLI agents easier. Personally, I run yolobox directly on the host. Being able to tell the agent it has sudo and can install and do whatever it needs to accomplish any task is handy.

▲

CodesInChaos 15 hours ago | parent | prev [-]

Sounds interesting. What kind of exploits did they find, apart from docker being exposed?

	▲	Finbarr 15 hours ago \| parent [-]
		Docker was only exposed later, after I realized that any sufficiently determined AI could break out of the container, and attempts to contain it were a waste of time. Also note that the docker socket is not exposed by default. There's a --docker flag for this. I made some comments about exploits in the original post [1]. Gemini was quite creative in adding git hooks to the repo that would execute on the host machine. That folder is shared.

▲

jms703 12 hours ago | parent | prev | next [-]

This is neat. Going to give it a spin and try it out.

▲

akurilin 2 days ago | parent | prev | next [-]

This is great stuff, walking the reader through your thought process was helpful for me as a developer to grok why yolobox was designed this way. I ended up landing in the "just make a local copy, don't get fancy" world myself after many iterations of workflows. Separate agents, separate containers, separate ports, that all resonates.

You mention this approach gobbling up a bunch of extra disk space as a consequence of the tradeoffs. Have you considered using APFS cloning on macOS to reduce some of that burden, or is that too tiny of an optimization to be worth it at this point?

▲

Finbarr 2 days ago | parent [-]

Hard drives are cheap and I haven't approached the limit yet. So I left this as a future optimization.

▲

CodesInChaos 16 hours ago | parent [-]

I'd try a modern file system with de-duplication/copy-on-write support. `cp` creates reflinks automatically if the file-system supports copy-on-write.

> Support for reflinks is indicated using the remap_file_range operation, which is currently (6.18) supported by bcachefs, Btrfs, CIFS, NFS 4.2, OCFS2, overlayfs, and XFS. Some external file systems support them too, including bcachefs and OpenZFS.

https://unix.stackexchange.com/questions/631237/in-linux-whi...

	▲	Finbarr 15 hours ago \| parent [-]
		Interesting suggestion, thank you!

▲

solomaker282 14 hours ago | parent | prev [-]

[dead]