| ▲ | veganmosfet an hour ago | |
It would be nice to publish the exact setup used (workspace dump, OpenClaw version, ...) to be able to reproduce and try out more payloads. In general I have mixed feelings about this result: sure, opus4.6 is excellent at following user intent and recognise potential prompt injection attempts. But: Is the "security" prompt used realistic for a generic use-case (processing of emails)? I guess not. In my experiments - without this specific prompt - I was able to derail the user intent to make opus4.8 download and execute a malicious script [0] just by asking "Summarize my new emails". | ||