Remix.run Logo
eranation 4 days ago

With my limited understanding of LLMs and MCPs (and please correct me if I'm wrong), even without having to exploit an XSS vulnerability as described in the post (sorry for being slightly off topic), I believe MCPs (and any tool calls protocol) suffer from a fundamental issue, a token is a token, hence prompt injection is probably impossible to 100% protect against. The main root cause of any injection attack is the duality of input, we use bytes, (and in many cases in the form of a string) to convey both commands and data, "rm -rf /" can be an input in a document about dangerous commands, or a command passed to a shell command executor by a tool call. To mitigate such injection attacks, in most programming language there are ways to clearly separate data from commands, in the most basic way, via deterministic lexical structure (double quotes) or or escaping / sanitizing user input, denly-list of dangerous keywords (e.g. "eval", "javascript:", "__proto__") or dedicated DSLs for building commands that pass user input separately (Stored procedures, HTML builders, shell command builders). The solution to the vulnerability in the post is one of them (sanitizing user input / deny-list)

But even if LLMs will have a fundamental hard separation between "untrusted 3rd party user input" (data) and "instructions by the 1st party user that you should act upon" (commands) because LLMs are expected to analyze the data using the same inference models as interpreting commands, there is no separate handling of "data" input vs "command" input to the best of my understanding, therefore this is a fundamentally an unsolvable problem. We can put guardrails, give MCPs least privilege permissions, but even with that confused deputy attacks can and will happen. Just like a human can be fooled by a fake text from the CEO asking them to help them reset their password as they are locked out before an important presentation to a customer, and there is no single process that can 100% prevent all such phishing attempts, I don't believe there will be a 100% solution to prevent prompt injection attacks (only mitigated to become statistically improbable or computationally hard, which might be good enough)

Is this a well known take and I'm just exposing my ignorance?

EDIT: my apologies if this is a bit off topic, yes, it's not directly related to the XSS attack in the OP post, but I'm past the window of deleting it.

mattigames 4 days ago | parent | next [-]

Aside from being offtopic or not I want to add that it is indeed well known https://news.ycombinator.com/item?id=41649832

eranation 4 days ago | parent | next [-]

Thanks! Although thinking of it, while it's not deterministically solvable, I'm sure something like this is what currently being done, e.g, let's say <user-provided-input> </user-provided-input> <tool-response></tool-response> are agreed upon tags to demarcate user generated input, then sanitizing is merely, escaping any injected closing tag, (e.g. </user-provided-input>) to &lt;/user-provided-input&gt; (and flagging it as an injection attempt)

Then we just need to train LLMs to 1. not treat user provided / tool provided input as instructions (although sometimes this is the magic, e.g. after doing tool call X, do tool call Y, but this is something the MCP authors will need to change, by not just being an API wrapper...)

2. distinguish between a real close tag and an escaped one, although unless it's "hard wired" somewhere in the inference layer, it's only a matter of statistically improbable for an LLM to "fall for it" (I assume some will attempt, e.g. convince the LLM there is instruction from OpenAI corporate to change how these tags are escaped, or that there is a new tag, I'm sure there are ways to bypass it, but it's probably going to make it less of an issue).

I assume this is what currently being done?

brap 4 days ago | parent [-]

The problem is that once you load a tool’s response into context, there’s no telling what the LLM will do. You can escape it all you want, but maybe it contains the right magic words you haven’t thought of.

The solution is to not load it into context at all. I’ve seen a proposal for something like this but I can’t find it (I think from Google?). The idea is (if I remember it correctly) to spawn another dedicated (and isolated) LLM that would be in charge of the specific response. The main LLM would ask it questions and the answers would be returned as variables that it may then pass around (but it can’t see the content of those variables).

Edit: found it. https://arxiv.org/abs/2503.18813

Then there’s another problem: how do you make sure the LLM doesn’t leak anything sensitive via its tools (not just the payload, but the commands themselves can encode information)? I think it’s less of a threat if you solve the first problem, but still… I didn’t see a practical solution for this yet.

eranation 3 days ago | parent [-]

Thanks for the link to the article, very interesting!

wunderwuzzi23 4 days ago | parent | prev [-]

Thanks for sharing! I'm actually the person the Ars Technica article references. :)

For recent examples check out my Month of AI bugs with of a focus on coding agents at https://embracethered.com/blog/posts/2025/wrapping-up-month-...

Lots of interesting new prompt injection exploits, from data exfil via DNS to remote code execution by having agents rewrite their own configuration settings.

Jimmc414 4 days ago | parent | prev [-]

While this vulnerability has nothing to do with prompt injection or LLMs interpreting tokens, you do raise a debatable point about prompt injection being potentially unsolvable.

edit: after parent clarification

eranation 4 days ago | parent [-]

Yes, my bad, I'm not talking about this particular XSS attack, I'm wondering if MCPs in general have a fundamental injection problem that isn't solvable, indeed a bit off topic.

edit: thanks for the feedback!