Remix.run Logo
the_harpia_io 21 hours ago

Yeah, the social engineering + structural combination is brutal to defend against. You can strip the technical hiding but the visible prompt injection still works on the model. Would be interesting to see how much of the ~70% success rate drops with just basic sanitization (strip comments, normalize whitespace, remove zero-width) vs more aggressive stripping.

If you build out a v2 with middleware testing, a leaderboard by framework would be killer. "How manipulation-proof is [Langchain/AutoGPT/etc] out of the box vs with basic defenses" would get a lot of attention.