| ▲ | jsnell 5 hours ago | |
I don't share your optimism. Those kinds measures would be just security theater, not "a lot better". Avoiding secrets appearing directly in the LLM's context or outputs is trivial, and once you have the workaround implemented it will work reliably. The same for trying to statically detect shell tool invocations that could read+obfuscate a secret. The only thing that would work is some kind of syscall interception, but at that point you're just reinventing the sandbox (but worse). Your "visually inspect the contents of the URL" idea seems unlikely to help either. Then the attacker just makes one innocous-looking request to get allowlisted first. | ||