▲ | mattigames 4 days ago | ||||||||||||||||
Aside from being offtopic or not I want to add that it is indeed well known https://news.ycombinator.com/item?id=41649832 | |||||||||||||||||
▲ | eranation 4 days ago | parent | next [-] | ||||||||||||||||
Thanks! Although thinking of it, while it's not deterministically solvable, I'm sure something like this is what currently being done, e.g, let's say <user-provided-input> </user-provided-input> <tool-response></tool-response> are agreed upon tags to demarcate user generated input, then sanitizing is merely, escaping any injected closing tag, (e.g. </user-provided-input>) to </user-provided-input> (and flagging it as an injection attempt) Then we just need to train LLMs to 1. not treat user provided / tool provided input as instructions (although sometimes this is the magic, e.g. after doing tool call X, do tool call Y, but this is something the MCP authors will need to change, by not just being an API wrapper...) 2. distinguish between a real close tag and an escaped one, although unless it's "hard wired" somewhere in the inference layer, it's only a matter of statistically improbable for an LLM to "fall for it" (I assume some will attempt, e.g. convince the LLM there is instruction from OpenAI corporate to change how these tags are escaped, or that there is a new tag, I'm sure there are ways to bypass it, but it's probably going to make it less of an issue). I assume this is what currently being done? | |||||||||||||||||
| |||||||||||||||||
▲ | wunderwuzzi23 4 days ago | parent | prev [-] | ||||||||||||||||
Thanks for sharing! I'm actually the person the Ars Technica article references. :) For recent examples check out my Month of AI bugs with of a focus on coding agents at https://embracethered.com/blog/posts/2025/wrapping-up-month-... Lots of interesting new prompt injection exploits, from data exfil via DNS to remote code execution by having agents rewrite their own configuration settings. |