| ▲ | wingmanjd 6 hours ago | |||||||||||||||||||||||||||||||||||||
I really liked Simon's Willison's [1] and Meta's [2] approach using the "Rule of Two". You can have no more than 2 of the following: - A) Process untrustworthy input - B) Have access to private data - C) Be able to change external state or communicate externally. It's not bullet-proof, but it has helped communicate to my management that these tools have inherent risk when they hit all three categories above (and any combo of them, imho). [EDIT] added "or communicate externally" to option C. [1] https://simonwillison.net/2025/Nov/2/new-prompt-injection-pa... [2] https://ai.meta.com/blog/practical-ai-agent-security/ | ||||||||||||||||||||||||||||||||||||||
| ▲ | btown 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
It's really vital to also point out that (C) doesn't just mean agentically communicate externally - it extends to any situation where any of your users can even access the output of a chat or other generated text. You might say "well, I'm running the output through a watchdog LLM before displaying to the user, and that watchdog doesn't have private data access and checks for anything nefarious." But the problem is that the moment someone figures out how to prompt-inject a quine-like thing into a private-data-accessing system, such that it outputs another prompt injection, now you've got both (A) and (B) in your system as a whole. Depending on your problem domain, you can mitigate this: if you're doing a classification problem and validate your outputs that way, there's not much opportunity for exfiltration (though perhaps some might see that as a challenge). But plaintext outputs are difficult to guard against. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | ArcHound 6 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
I recall that. In this case, you have only A and B and yet, all of your secrets are in the hands of an attacker. It's great start, but not nearly enough. EDIT: right, when we bundle state with external Comms, we have all three indeed. I missed that too. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||