▲ | skaul 4 days ago | ||||||||||||||||||||||||||||||||||||||||
> you reply to me like I need to be lectured That's not my intention! Just stating how we're thinking about this. > defense in depth is to prevent one layer failure from getting to the next We think a separate model can help with one layer of this: checking if the planner model's actions are aligned with the user's request. But we also need guarantees at other layers, like distinguishing web contents from user instructions, or locking down what tools the model has access to in what context. Fundamentally, though, like we said in the blog post: "The attack we developed shows that traditional Web security assumptions don’t hold for agentic AI, and that we need new security and privacy architectures for agentic browsing." | |||||||||||||||||||||||||||||||||||||||||
▲ | simonw 4 days ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||
"But we also need guarantees at other layers, like distinguishing web contents from user instructions" How do you intend to do that? In the three years I've spent researching and writing about prompt injection attacks I haven't seen a single credible technique from anyone that can distinguish content from instructions. If you can solve that you'll have solved the entire class of prompt injection attacks! | |||||||||||||||||||||||||||||||||||||||||
|