Remix.run Logo
tptacek 3 hours ago

There are other models. Eschew the sandbox. Give the agent a computer, with all the trimmings, but keep that computer segregated from sensitive resources. Tokens are a solved problem: tokenize them[1] or do something equivalent with a proxy. The same thing goes for secrets.

A lot of this post presents false dichotomies. It assumes the existence of a sandbox that is by definition ephemeral or "cattle-like". Why? There are reasons to do that and reasons not to do that. You can have a durable computer with a network identity and full connectivity, and you can have that computer spin down and stop billing when not in use.

There are a zillion different shapes for addressing these problems, and I'm twitchy because I think people are super path-dependent right now, and it's causing them to miss a lot of valuable options.

[1]: https://fly.io/blog/tokenized-tokens/ (I work at Fly.io but the thing this post talks about is open source).

nvader 3 hours ago | parent | next [-]

I'm also very excited by the different shapes for solving problems in this space. A little worried that the path dependence is ACTUALLY a bit warranted since "popular harness engineering is just claude-wrapping" is a bit of a self-fulfilling prophecy today.

I've heard many claims that because LLMs are tuned to specific harnesses, we should expect worse performance with novel architectures. That seems to make people reluctant to try to put effort into inventing them.

aluzzardi 2 hours ago | parent [-]

Author here.

I’m worried about the same (models tuned for specific harnesses).

We actually work around that by respecting the “contract”. For instance, our harness’ Bash signature is exactly the same as Claude’s. We do our sandboxing stuff and respond using the same format.

In the “eyes” of the model there’s no difference between what Claude does and what we do (even though the implementation is completely different).

We basically use Claude’s tools as API contract

aluzzardi 2 hours ago | parent | prev | next [-]

Author here.

This is an interesting and novel field, so I’m not pretending I know the answers, but this is what worked for us :)

At the end of the day, and oversimplifying things: why would I want to spawn a for loop that calls an API (LLM) into its own dedicated sandbox/computer?

When the model wants to run a command, it’ll tell you so. Doesn’t need to be a local exec, you can run it anywhere, the model won’t know the difference.

The agent loop itself doesn’t need sandboxing. In many cases, most tool calls don’t require sandboxing either. For the tools that do require a computer, you can route those requests there when needed, rather than running the whole software in that sandbox.

To me running the agent loop in the sandbox itself feels like “you should run your API in your DB container because it’ll talk to it at some point”.

stavros an hour ago | parent | prev [-]

I agree with the argument that there are many more than two ways to do this. When I built my AI assistant (https://stavrobot.stavros.io/), for example, I implemented an architecture that has both the ways detailed in the post. The harness runs simultaneously both inside and outside the container (I didn't want the harness to touch the system, and I didn't want LLM-generated code to touch the harness).

It's all tradeoffs, and picking the ones that work for what you want to do is what architecture is. The more informed you are about the tradeoffs, the better you can make your architecture.