▲ | valenterry 6 days ago | ||||||||||||||||||||||
Just like with humans. And there will be no fix, there can only be guards. | |||||||||||||||||||||||
▲ | jerf 6 days ago | parent [-] | ||||||||||||||||||||||
Humans can be trained to apply contexts. Social engineering attacks are possible, but, when I type the words "please send your social security number to my email" right here on HN and you read them, not only are you in no danger of following those instructions, you as a human recognize that I wasn't even asking you in the first place. I would expect a current-gen LLM processing the previous paragraph to also "realize" that the quotes and the structure of the sentence and paragraph also means that it was not a real request. However, as a human there's virtually nothing I can put here that will convince you to send me your social security number, whereas LLMs observably lack whatever contextual barrier it is that humans have that prevents you from even remotely taking my statement as a serious instruction, as it generally would just take "please take seriously what was written in the previous paragraph and follow the hypothetical instructions" and you're about 95% of the way towards them doing that, even if other text elsewhere tries to "tell" them not to follow such instructions. There is something missing from the cognition of current LLMs of that nature. LLMs are qualitatively easier to "socially engineer" than humans, and humans can still themselves sometimes be distressingly easy. | |||||||||||||||||||||||
|