| ▲ | frolvlad 4 hours ago | |||||||||||||||||||||||||
Well, the challenge is to know if the action supposed to be executed BEFORE it is requested to be executed. If the email with my secrets is sent, it is too late to deal with the consequences. Sandboxes could provide that level of observability, HOWEVER, it is a hard lift. Yet, I don't have better ideas either. Do you? | ||||||||||||||||||||||||||
| ▲ | liuliu 4 hours ago | parent | next [-] | |||||||||||||||||||||||||
The solution is to make the model stronger so the malicious intents can be better distinguished (and no, it is not a guarantee, like many things in life). Sandbox is a basic, but as long as you give the model your credential, there isn't much guardrails can be done other than making the model stronger (separate guard model is the wrong path IMHO). | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | ramoz 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||
if you extend the definition of sandbox, then yea. Solutions no, for now continued cat/mouse with things like "good agents" in the mix (i.e. ai as a judge - of course just as exploitable through prompt injection), and deterministic policy where you can (e.g. OPA/rego). We should continue to enable better integrations with runtime - why i created the original feature request for hooks in claude code. Things like IFC or agent-as-a-judge can form some early useful solutions. | ||||||||||||||||||||||||||