| ▲ | the_harpia_io a day ago | |
This is cool - the ~70% success rate on basic attacks tracks with what I've seen. Most agent frameworks just pipe raw text through without any sanitization because "it's just summarizing a page, what could go wrong." The screenshot approach nate mentions is interesting but feels like trading one problem for another. You're immune to text injection but now vulnerable to visual tricks - misleading rendered text, fake UI elements, those unicode lookalike characters that render identically but have different meanings. Curious if you've tested any agents that do pre-processing on the HTML - like stripping invisible elements, normalizing unicode, etc - before passing to the model. That's the approach I've seen in a few internal tools but haven't benchmarked how effective it actually is against multi-layer attacks like yours. | ||