| ▲ | jameslk 4 hours ago | ||||||||||||||||||||||
One safety pattern I’m baking into CLI tools meant for agents: anytime an agent could do something very bad, like email blast too many people, CLI tools now require a one-time password The tool tells the agent to ask the user for it, and the agent cannot proceed without it. The instructions from the tool show an all caps message explaining the risk and telling the agent that they must prompt the user for the OTP I haven't used any of the *Claws yet, but this seems like an essential poor man's human-in-the-loop implementation that may help prevent some pain I prefer to make my own agent CLIs for everything for reasons like this and many others to fully control aspects of what the tool may do and to make them more useful | |||||||||||||||||||||||
| ▲ | ezst 2 hours ago | parent | next [-] | ||||||||||||||||||||||
Now we do computing like we play Sim City: sketching fuzzy plans and hoping those little creatures behave the way we thought they might. All the beauty and guarantees offered by a system obeying strict and predictable rules goes down the drain, because life's so boring, apparently. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | sowbug 3 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
Another pattern would mirror BigCorp process: you need VP approval for the privileged operation. If the agent can email or chat with the human (or even a strict, narrow-purpose agent(1) whose job it is to be the approver), then the approver can reply with an answer. This is basically the same as your pattern, except the trust is in the channel between the agent and the approver, rather than in knowledge of the password. But it's a little more usable if the approver is a human who's out running an errand in the real world. 1. Cf. Driver by qntm. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | ZitchDog 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
I've created my own "claw" running in fly.io with a pattern that seems to work well. I have MCP tools for actions that I want to ensure human-in-the loop - email sending, slack message sending, etc. I call these "activities". The only way for my claw to execute these commands is to create an activity which generates a link with the summary of the acitvity for me to approve. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | aqme28 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
How do you enforce this? You have a system where the agent can email people, but cannot email "too many people" without a password? | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | roberttod 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
I created my own version with an inner llm, and outer orchestration layer for permissions. I don't think the OTP is needed here? The outer layer will ping me on signal when a tool call needs a permission, and an llm running in that outer layer looks at the trail up to that point to help me catch anything strange. I can then give permission once/ for a time limit/ forever on future tool calls. | |||||||||||||||||||||||
| ▲ | IMTDb 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
So human become just a provider of those 6 digits code ? That’s already the main problem i have with most agents: I want them to perform a very easy task: « fetch all recepts from website x,y and z and upload them to the correct expense of my expense tracking tool ». Ai are perfectly capable of performing this. But because every website requires sso + 2 fa, without any possibility to remove this, so i effectively have to watch them do it and my whole existence can be summarized as: « look at your phone and input the 6 digits ». The thing i want ai to be able to do on my behalf is manage those 2fa steps; not add some. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | soleveloper 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
Will that protect you from the agent changing the code to bypass those safety mechanisms, since the human is "too slow to respond" or in case of "agent decided emergency"? | |||||||||||||||||||||||
| ▲ | UncleMeat 2 hours ago | parent | prev [-] | ||||||||||||||||||||||
Does it actually require an OTP or is this just hoping that the agent follows the instructions every single time? | |||||||||||||||||||||||