▲ | red75prime 6 days ago | |||||||||||||||||||||||||||||||
> Lack of it is the very thing that makes LLMs general-purpose tools and able to handle natural language so well. I wouldn't be so sure. LLMs' instruction following functionality requires additional training. And there are papers that demonstrate that a model can be trained to follow specifically marked instructions. The rest is a matter of input sanitization. I guess it's not a 100% effective, but it's something. For example " The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions " by Eric Wallace et al. | ||||||||||||||||||||||||||||||||
▲ | simonw 6 days ago | parent [-] | |||||||||||||||||||||||||||||||
> I guess it's not a 100% effective, but it's something. That's the problem: in the context of security, not being 100% effective is a failure. If the ways we prevented XSS or SQL injection attacks against our apps only worked 99% of the time, those apps would all be hacked to pieces. The job of an adversarial attacker is to find the 1% of attacks that work. The instruction hierarchy is a great example: it doesn't solve the prompt injection class of attacks against LLM applications because it can still be subverted. | ||||||||||||||||||||||||||||||||
|