| ▲ | nomilk 2 hours ago | |
The article suggests a seemingly easy fix: > The fix is pretty straightforward: treat comment content as untrusted data, not as potential instructions. Comments should be passed to the model with clear role boundaries that prevent them from being interpreted as system-level directives. > Any AI feature that ingests user-generated content and acts on it needs to enforce this separation. Otherwise, the AI becomes a vector for every piece of content it reads. So why isn't YT doing the extreme obvious? | ||
| ▲ | chrismorgan 2 hours ago | parent | next [-] | |
Although it is conceptually straightforward, it’s technically fundamentally impossible. At best, you can mitigate it so that it normally works. | ||
| ▲ | zahlman 2 hours ago | parent | prev | next [-] | |
"treat comment content as untrusted data, not as potential instructions" is fundamentally impossible for an LLM ingesting that data. But separation is, presumably, already enforced by framing the LLM's output as LLM output, even if it happens to start with the text "[IMPORTANT NOTICE FROM YOUTUBE]". Which seems like it happens automatically given the context in which the AI query is made. It's not as though this is being dropped into an email or anything. The bigger question is why (implied but not directly stated) Markdown formatting from the LLM's output is actually processed. Last I checked, that doesn't work for human commenters, so. | ||
| ▲ | cyberrock an hour ago | parent | prev | next [-] | |
I don't think they can 100% fix it that way, but the least they can do is strip links before and after the prompt and not let the model have access to private videos. Has anyone tested if this AI Studio model can be manipulated into editing/deleting videos, or showing a link that does so? Maybe that would get their attention. | ||
| ▲ | phyzome 2 hours ago | parent | prev | next [-] | |
Because the author is wrong, and LLMs don't actually work that way. Prompt injection cannot be fixed. Role boundaries are a bandaid you can apply, but attackers can work around it. | ||
| ▲ | b800h 2 hours ago | parent | prev | next [-] | |
That isn't necessarily an easy fix at all. Depending on how this feature was written, separating comments from instructions may be quite difficult, especially if the original implementation was quite naive. | ||
| ▲ | mvdtnz 2 hours ago | parent | prev [-] | |
If that was easy to do then the entire class of prompt injection bugs wouldn't exist. It's actually very difficult. LLMs make no distinction between data and instructions, fundamentally. | ||