Remix.run Logo
reassess_blind 2 hours ago

Yes, it’s worthwhile because the new models are being specifically trained and hardened against prompt injection attacks.

Much like how you wouldn’t immediately fire Alice, you’d train her and retest her, and see whether she had learned from her mistakes. Just don’t trust her with your sensitive data.

datsci_est_2015 an hour ago | parent [-]

Hmm I guess it will have to get to a point where social engineering an individual at a company is more appealing than prompt injecting one of its agents.

It’s interesting though, because the attack can be asymmetric. You could create a honeypot website that has a state-of-the-art prompt injection, and suddenly you have all of the secrets from every LLM agent that visits.

So the incentives are actually significantly higher for a bad actor to engineer state-of-the-art prompt injection. Why only get one bank’s secrets when you could get all of the banks’ secrets?

This is in comparison to targeting Alice with your spearphishing campaign.

Edit: like I said in the other comment, though, it’s not just that you _can_ fire Alice, it’s that you let her know if she screws up one more time you will fire her, and she’ll behave more cautiously. “Build a better generative AI” is not the same thing.