Remix.run Logo
buremba 8 hours ago

My take is that agents should only take actions that you can recover from by default. You can gradually give it more permission and build guardrails such as extra LLM auditing, time boxed whitelisted domains etc. That's what I'm experimenting with https://github.com/lobu-ai/lobu

1. Don't let it send emails from your personal account, only let it draft email and share the link with you.

2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai.

3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.

4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack.

Last but last least, use proper VMs like Kata Containers and Firecrackers. Not just Docker containers in production.

alexhans 8 hours ago | parent | next [-]

That's a decent practice from the lens of reducing blast radius. It becomes harder when you start thinking about unattended systems that don't have you in the loop.

One problem I'm finding discussion about automation or semi-automation in this space is that there's many different use cases for many different people: a software developer deploying an agent in production vs an economist using Claude Vs a scientist throwing a swarm to deal with common ML exploratory tasks.

Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.

AI Evals, sandboxing, observability seem like 3 key pillars to maintain intent in automation but how to help these different audiences be safely productive while fast and speak the same language when they need to product build together is what is mostly occupying my thoughts (and practical tests).

daveguy 6 hours ago | parent [-]

Current LLMs are nowhere near qualified to be autonomous without a human in the loop. They just aren't rigorous enough. Especially the "scientist throwing a swarm to deal with common ML exploratory tasks." The judgement of most steps in the exploratory task require human feedback based on the domain of study.

> Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.

Completely agreed. This is because LLMs are atrocious at judgement and guiding the sequence of exploration is critically dependent on judgement.

shich an hour ago | parent | prev | next [-]

The proxy approach for secret injection is the right mental model, but it only works if the proxy itself is hardened against prompt injection. An agent that can't access secrets directly can still be manipulated into crafting requests that leak data through side channels — URL params, timing, error messages.

The deeper issue: most of these guardrails assume the threat is accidental (agent goes off the rails) rather than adversarial (something in the agent's context is actively trying to manipulate it). Time-boxed domain whitelists help with the latter but the audit loop at session end is still reactive.

The /revert snapshot idea is underrated though. Reversibility should be the first constraint, not an afterthought.

buremba 15 minutes ago | parent [-]

> but it only works if the proxy itself is hardened against prompt injection.

Yes, I'm experimenting using a small model like Haiku to double check if the request looks good. It adds quite a bit of latency but it might be the right approach.

Honestly; it's still pretty much like early days of self driving cars. You can see the car can go without you supervising it but still you need to keep an eye on where it's going.

Doublon 6 hours ago | parent | prev | next [-]

I'd like to try a pattern where agents only have access to read-only tools. They can read you emails, read your notes, read your texts, maybe even browse the internet with only GET requests...

But any action with side-effects ends up in a Tasks list, completely isolated. The agent can't send an email, they don't have such a tool. But they can prepare a reply and put it in the tasks list. Then I proof-read and approve/send myself.

If there anything like that available for *Claws?

swid 4 hours ago | parent | next [-]

There is no real such thing as a read only GET request if we are talking about security issues here. Payloads with secrets can still be exfiltrated, and a server you don’t control can do what it wants when it gets the request.

zahlman 2 hours ago | parent | prev [-]

GET and POST are merely suggestions to the server. A GET request still has query parameters; even if the server is playing by the book, an agent can still end up requesting GET http://angelic-service.example.com/api/v1/innocuous-thing?pa... and now your `dangerous-secret` is in the server logs.

You can try proxying and whitelisting its requests but the properly paranoid option is sneaker-netting necessary information (say, the documentation for libraries; a local package index) to a separate machine.

fnord77 7 hours ago | parent | prev [-]

> 1. Don't let it send emails from your personal account, only let it draft email and share the link with you.

Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.

> 3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about.

harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)

buremba 6 hours ago | parent | next [-]

> Right now there's no way to have fine-grained draft/read only perms on most email providers or email clients. If it can read your email it can send email.

> harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)

You should never give any secrets to your agents, like your Gmail access tokens. Whenever agents needs to take an action, it should perform the request and your proxy should check if the action is allowed and set the secrets on the fly.

That means agents should not have access to internet without a proxy, which has proper guardrails. Openclaw doesn't have this model unfortunately so I had to build a multi-tenant version of Openclaw with a gateway system to implement these security boundaries.

zahlman 2 hours ago | parent [-]

> That means agents should not have access to internet without a proxy, which has proper guardrails. Openclaw doesn't have this model unfortunately so I had to build a multi-tenant version of Openclaw with a gateway system to implement these security boundaries.

I wonder how long until we see a startup offering such a proxy as a service.

livestories 29 minutes ago | parent | prev | next [-]

I think mailto: links they output (a la

https://mailtolink.me/

) are a great way to get these drafts out even.

zahlman 2 hours ago | parent | prev | next [-]

> harder than you might think. openclaw found my browser cookies. (I ran it on a vm so no serious cookies found, but still)

It's easy, and you did it the right way. Read "don't let your agents see any secret" as "don't put secrets in a filesystem the agents have access to".

arianvanp 6 hours ago | parent | prev [-]

Literally every email client on the planet has supported `mailto:` URIs since basically the existence of the world wide web.

Just generate a mailto Uri with the body set to the draft.