Remix.run Logo
yen223 2 hours ago

There's a lot of overlap between the "disregard this" vulnerability among LLMs and social engineering vulnerabilities among humans.

The mitigations are also largely the same, i.e. limit the blast radius of what a single compromised agent (LLM or human) can do

calpaterson 2 hours ago | parent | next [-]

I agree and one of the things that makes it harder to handle "disregard that!" is that many models for LLM deployment involve positioning the agent centrally and giving it admin superpowers.

I mention in the footnotes that I think that it makes more sense for the end-user of the LLM to be the one running it. That meshes with RBAC better (the user's LLM session only has the perms the user is actually entitled to) and doesn't devolve into praying the LLM says on-task.

zahlman an hour ago | parent | prev [-]

It also seems to have a fair bit in common with SQL injection.