| ▲ | EFLKumo 3 hours ago | |||||||||||||
It's not the first time we hear about prompt injection attacks, and for sure it's the fault of Microsoft. Many talking about the prompt injection itself, whether Copilot should be able to defense prompt injections, etc. But that's not the problem. OpenAI released their LLM-driven browser Atlas last year. Though their team is brilliant (https://openai.com/index/hardening-atlas-against-prompt-inje...), there has been a number of succeeded injection attacks. IMO the real vulnerability is located at the "Act" part of "ReAct" (reasoning and action) agent framework. > “[Copilot] Cowork asks for your permission before taking sensitive actions...” ... when the recipient is the active user, these actions execute immediately without requiring human approval (users do not have a setting to modify this behavior). > Copilot Cowork can retrieve ‘pre-authenticated download links’ for files the user has access to, which allow anyone who opens the link to download that file. > Microsoft Copilot Cowork has read access to essentially any resource a user does through Microsoft Graph. As such, the primary mechanism to reduce the blast radius of attacks like this is to restrict excessive permissioning across one’s Microsoft ecosystem. Take it easy. Inside the whole attack flow, Microsoft gives Cowork unrestricted access and the ability to bypass approvals. I don't find much problem with LLMs here. It's said the attack is also a threat for Opus 4.7, but I've found several times Opus 4.7 forbidding context7.com's "prompt injections" only requiring opus to ask me creating an context7 API key to get more requests for free. From my personal experience, such models indeed are trained to perceive injections, but these injections could mask themselves as sth like Agent Skills, and there are always ways to win as red teams. We may not lay our hope too much on defense of injections, but concentrating on restricting LLM's permissions. The popular usage of CLIs in agents' (especially coding agents) workflow has also concerned me since most cli tools an agent can access actually have the same permissions with users. | ||||||||||||||
| ▲ | stingraycharles 3 hours ago | parent | next [-] | |||||||||||||
“IMO the real vulnerability is located at the "Act" part of "ReAct" (reasoning and action) agent framework.” This is a fancy way of saying that “the problem is tool calling”, which is obviously true. The problem is that, when it works correctly (99.99% of the time), it adds so much more value to LLMs. Sandboxing is a step in the right direction, but can also add friction. Using guardrails is also good, but adds latency, expenses, and also doesn’t solve 100% of the issues. IMHO there currently does not exist a proper solution to this problem, and it has yet to be discovered. The proper solution, however, should NOT be based on LLMs, so guardrails are the incorrect direction (albeit effective and easier to implement). | ||||||||||||||
| ||||||||||||||
| ▲ | ethin 2 hours ago | parent | prev [-] | |||||||||||||
The problem is natural language as a medium. It is too ambiguous and has way too many variants to say literally anything imaginable that there is no way of protecting against prompt injection without some kind of NLP filter or something. I don't really see how someone can develop a kind of protection against this given these problems. | ||||||||||||||