Remix.run Logo
NotASithLord 2 hours ago

It's defense in depth, definitely not a silver bullet. The web runner has no access to wider capabilities outside of that page. So the only path for a prompt injection to do anything is to try and get itself included in the summary, and get the main loop to act on an instruction in that summary. That means getting pass two sets of <untrusted> tags and explicit instructions to treat everything inside as information, not instruction. Then the egress checkpoint and allow/deny whitelists are the final guards regardless of what the main loop decides to do. Trying to harden wherever I can if you have any recs.