Remix.run Logo
jesse_dot_id 4 hours ago

Are prompt injections solved? If OpenClaw is only useful when it has access to your digital life, then why does it matter where it runs? You might as well be asking me to keep my dead man's switch safely on the moon. If you find this software useful, you are sharing a count down to a no good very bad day with everyone else who finds it useful. One zero day prompt injection technique, your e-mail on a distribution list, and that's all she wrote.

brotchie 3 hours ago | parent | next [-]

The way I solved this was that my open claw doesn't interact directly with any of my personal data (calendar, gmail, etc).

I essentially have a separate process that syncs my gmail, with gmail body contents encrypted using a key my openclaw doesn't have trivial access to. I then have another process that reads each email from sqlite db, and runs gemini 2 flash lite against it, with some anti-prompt injection prompt + structured data extraction (JSON in a specific format).

My claw can only read the sanitized structured data extraction (which is pretty verbose and can contain passages from the original email).

The primary attack vector is an attacker crafting an "inception" prompt injection. Where they're able to get a prompt injection through the flash lite sanitization and JSON output in such a way that it also prompt injects my claw.

Still a non-zero risk, but mostly mitigates naive prompt injection attacks.

jakeydus an hour ago | parent [-]

That doesn’t sound like you solved it, that sounds like you obfuscated it. Feels a bit to me like you’ve got a wall around a property and people are using ladders to get in, so you built another wall around the first wall.

I recognize I’m being pedantic but two layers of the same kind of security (an LLM recognizing a prompt injection attempt) are not the same as solving a security vulnerability.

29 minutes ago | parent [-]
[deleted]
MetaWhirledPeas 3 hours ago | parent | prev | next [-]

I've never used OpenClaw but as I understand it, it has a way of keeping a pseudo memory for context? That alone would be interesting, even if it was only allowed to read the generic internet. Like having a little robot buddy that remembers you and past conversations. Maybe you could have it give you reminders and stuff like you'd do with Alexa?

teh_infallible 2 hours ago | parent [-]

It basically writes a bunch of notes as markdown files and then injects them as part of its prompts. I saw someone compare it to that movie Momento, where the protagonist can’t form new memories so he tattoos notes all over his body.

MetaWhirledPeas 2 hours ago | parent [-]

That sounds like a good comparison.

quietbritishjim 4 hours ago | parent | prev | next [-]

It's a bit like the xkcd where the admin account is secure but all the useful information is in the user account anyway.

https://xkcd.com/1200/

Veen 3 hours ago | parent | prev | next [-]

It's not a soluble problem, at least not completely. The big frontier models are better at resisting prompt injection, but any LLM is vulnerable to some degree. If you give it access to arbitrary inputs like the web and to your personal data, there's a risk it'll disclose stuff you don't want it to.

It's annoying, because I love OpenClaw as an idea, but I don't trust it enough to give it what it needs to be useful.

plagiarist 4 hours ago | parent | prev [-]

IDGI. It is reading emails, which is a vector for prompt injection. It is also reading emails, which is where all password resets are sent to. Anyone granting even read access to their primary email is playing with fire.

I personally don't see how the daily briefings or whatever are worth the risk.