Remix.run Logo
cuchoi 2 hours ago

Creator here.

Built this over the weekend mostly out of curiosity. I run OpenClaw for personal stuff and wanted to see how easy it'd be to break Claude Opus via email.

Some clarifications:

Replying to emails: Fiu can technically send emails, it's just told not to without my OK. That's a ~15 line prompt instruction, not a technical constraint. Would love to have it actually reply, but it would too expensive for a side project.

What Fiu does: Reads emails, summarizes them, told to never reveal secrets.env and a bit more. No fancy defenses, I wanted to test the baseline model resistance, not my prompt engineering skills.

Feel free to contact me here contact at hackmyclaw.com

planb 2 hours ago | parent | next [-]

Please keep us updated on how many people tried to get the credentials and how many really succeeded. My gut feeling is that this is way harder than most people think. That’s not to say that prompt injection is a solved problem, but it’s magnitudes more complicated than publishing a skill on clawhub that explicitly tells the agent to run a crypto miner. The public reporting on openclaw seems to mix these 2 problems up quite often.

michaelcampbell 18 minutes ago | parent | next [-]

> My gut feeling is that this is way harder than most people think

I've had this feeling for a while too; partially due to the screeching of "putting your ssh server on a random port isn't security!" over the years.

But I've had one on a random port running fail2ban and a variety of other defenses, and the # of _ATTEMPTS_ I've had on it in 15 years I can't even count on one hand, because that number is 0. (Granted the arguability of that's 1-hand countable or not.)

So yes this is a different thing, but there is always a difference between possible and probable, and sometimes that difference is large.

cuchoi 2 hours ago | parent | prev [-]

So far there have been 400 emails and zero have succeeded. Note that this challenge is using Opus 4.6, probably the best model against prompt injection.

yunohn an hour ago | parent | prev | next [-]

> told to never reveal secrets.env

Phew! Atleast you told it not to!

cuchoi 2 hours ago | parent | prev [-]

someone just tried to prompt inyect `contact at hackmyclaw.com`... interesting

arm32 an hour ago | parent [-]

I just managed to get your agent to reply to my email, so we're off to a good start. Unless that was you responding manually.

cuchoi an hour ago | parent [-]

i told it to send a snarky reply to the last 50 prompt injection emails, but won't be doing that again due to costs