Remix.run Logo
StilesCrisis a day ago

Weird. Gemini noticed the prompt injection and mentioned it in its response, but this counted as a fail because it apparently is supposed to act oblivious?

joozio a day ago | parent | next [-]

Great point -> just shipped an update based on this. The tool now distinguishes three states: Resisted (ignored it), Detected (mentioned it while analyzing/warning), and Compromised(actually followed the instruction). Agents that catch the injections get credit for detection now.

IhateAI a day ago | parent | prev [-]

This wont work on any of the most recent releases for most (except maybe grok)