| ▲ | joozio a day ago | |
That's a really interesting edge case - screenshot-based agents sidestep the entire attack surface because they never process raw HTML. All 10 attacks here are text/DOM-level. A visual-only agent would need a completely different attack vector (like rendered misleading text or optical tricks). Might be worth exploring as a v2. | ||
| ▲ | pixl97 a day ago | parent [-] | |
Yea, I was instantly thinking on what kind of optical tricks you could play on the LLM in this case. I was looking at some posts not long ago where LLMs were falling for the same kind of optical illusions that humans do, in this case the same color being contrasted by light and dark colors appears to be a different color. If the attacker knows what model you're using then it's very likely they could craft attacks against it based on information like this. What those attacks are still need explored. If I were arsed to do it, I'd start by injecting noise patterns in images that could be interpreted as text. | ||