Remix.run Logo
the_harpia_io a day ago

This is cool - the ~70% success rate on basic attacks tracks with what I've seen. Most agent frameworks just pipe raw text through without any sanitization because "it's just summarizing a page, what could go wrong."

The screenshot approach nate mentions is interesting but feels like trading one problem for another. You're immune to text injection but now vulnerable to visual tricks - misleading rendered text, fake UI elements, those unicode lookalike characters that render identically but have different meanings.

Curious if you've tested any agents that do pre-processing on the HTML - like stripping invisible elements, normalizing unicode, etc - before passing to the model. That's the approach I've seen in a few internal tools but haven't benchmarked how effective it actually is against multi-layer attacks like yours.