▲ | simonw 6 days ago | |
LLMs are already unpredictable in their responses which adds to the problem: you might test your system against a potential prompt injection three times and observe it resist the attack: an attacker might try another hundred times and have one of their attempts work. | ||
▲ | TeMPOraL 6 days ago | parent | next [-] | |
Same is true with people - repeat attempts at social engineering will eventually succeed. We deal with that by a combination of training, segregating responsibilities, involving multiple people in critical decisions, and ultimately, by treating malicious attempts at fooling people as felonies. Same is needed with LLMs. In context of security, it's actually helpful to anthropomorphize LLMs! They are nowhere near human, but they are fundamentally similar enough to have the same risks and failure modes. | ||
▲ | pixl97 6 days ago | parent | prev [-] | |
With this said, it's like we need some way for the LLM to identify in band attacks and point them out to somebody (not the attacker either). |