| ▲ | stingraycharles 4 hours ago |
| I’m super interested since it seems like you have given everything a lot of thought and effort but I am not sure I understand it. When I’m thinking of sandboxes, I’m thinking of isolated execution environments. What does forking sandboxes bring me? What do your sandboxes in general bring me? Please take this in the best possible way: I’m missing a use case example that’s not abstract and/or small. What’s the end goal here( |
|
| ▲ | benswerd 3 hours ago | parent | next [-] |
| So isolation is correct. Forking a sandbox gives you multiple exact duplicates of isolated environments. When your coding agent has 10 ideas for what to do, to evaluate them correctly it needs to be able to evaluate them in isolation. If you're building a website testing agent and halfway down a website, with a form half filled out a session ongoing, etc and it realizes it wants to test 2 things in isolation, forking is the only way. We also envision this powering the next generation of devcycles "AI Agent, go try these 10 things and tell me which works best". AI forks the environment 10 times, gets 10 exact copies, does the thing in each of them, evaluates it, then takes the best option. |
| |
| ▲ | indigodaddy 3 hours ago | parent [-] | | Yep I can see this especially when the agent is spinning up test servers/smokes and you don't want those conflicting. How do we reconcile all the potential different git hashes though, upstream I guess etc (this might be an easy answer and I'm not super proficient with git so forgive) | | |
| ▲ | benswerd 3 hours ago | parent [-] | | So we recommend branch per fork, merge what you like. You have to change the branch on each fork individually currently and thats unlikely to change in the short term due to the complexity of git internals, but its not that hard to do yourself `git checkout -b fork-{whateverDiscriminator}` |
|
|
|
| ▲ | wsve 3 hours ago | parent | prev [-] |
| Agreed, the thing I'd be most interested in is the isolated execution environment you mentioned. Agents running autopilot are powerful. Agents running unsupervised on a machine with developer permissions and certificates where anything could influence the agent to act on an attacker's behalf is terrifying |
| |
| ▲ | benswerd 3 hours ago | parent [-] | | I recommend running the agent harness outside of the computer. The mental model I like to use is the computer is a tool the agent is using, and anything in the computer is untrusted. | | |
| ▲ | jeremyjh 3 hours ago | parent | next [-] | | I would recommend not giving an agent the full run of any computing environment. Do handle fine grained internet access controls and credential injection like OpenShell does? | | |
| ▲ | benswerd 3 hours ago | parent [-] | | I used to believe this, but I think the next generation of agents is much more autonomous and just needs a computer. The work of a developer is open ended, so we use a computer for it. We don't try to box developers into small granular screwdrivers for each small thing. Thats whats coming to all agents, they might want to run some analysis with python, want to generate a website/document in typescript, and might want to store data in markdown files or in MongoDB. I expect them to get much more autonomous and with that to end up just needing computers like us. |
| |
| ▲ | croes 3 hours ago | parent | prev [-] | | The problem is the agent, which should be treated untrusted.
The computer isn’t the problem | | |
| ▲ | benswerd 3 hours ago | parent [-] | | Kind of. The chat logs of the agent are trustworthly, as should any telemetry you have on it or coming out of the VM. Its behavior should be treated as probabilistic and therefore untrustworthly. |
|
|
|