Remix.run Logo
kirtivr 3 hours ago

Is this an admission that prompt injection attacks can indeed not be blocked by an analysis based technique?

If so many tools are straight up blocked, I would be very sceptical of the quality of the results.

sigmoid10 3 hours ago | parent [-]

I think "prompt injection prevention" systems fall into the same category as "llm writing detection" systems. I.e. reality is always a step ahead and you shouldn't trust either one for anything remotely important.

kirtivr an hour ago | parent [-]

Yeah, the problem reduces to trying to restrict a motivated model which is trying to exfiltrate data.

That's a problem we are just now wrapping our minds around.

It's not as simple as prompt sanitization. The model is the interpreter, and we don't yet have the right tools to guide it.