Remix.run Logo
Show HN: Continue? Y/N: A 60-second game about AI agent permission fatigue(llmgame.scalex.dev)
7 points by Wirbelwind 3 hours ago | 10 comments
sevenseacat a minute ago | parent | next [-]

Continue? Y/N ── SCORE: 2,343 Security-Conscious Engineer

Caught 8/8 threats "Not a single secret leaked"

→ llmgame.scalex.dev

MeetingsBrowser 5 minutes ago | parent | prev | next [-]

It would be cool to see the distribution of all player scores.

carterschonwald 8 minutes ago | parent | prev | next [-]

some of the sandboxing ive been playing with gives me the best of both yolo and like logic programming tier perms on llm actions in env. still not ready for prime time though ;)

chuckadams 9 minutes ago | parent | prev | next [-]

Got a nearly perfect score, I just blocked one safe command. Frankly it was pretty easy, whereas a lot of the stuff I allow in real life is "contains brace with quote character" with a dozen lines of bash script that doesn't even display fully unless I hit ctrl-o, and no I don't scrutinize every one of those. These days I use auto mode, which eats more tokens and isn't infallible, but FSM knows it's looking at them harder than I am.

cadwell 11 minutes ago | parent | prev | next [-]

1,640 points on my first try—I fell into a few traps, but it was really interesting. Thanks for the little game! I'm sharing it with my coworkers :)

nardib 2 hours ago | parent | prev [-]

Use this and save yourself:

claude --dangerously-skip-permissions

tasuki 4 minutes ago | parent | next [-]

Just make sure to run it in an isolated environment where it's ok to mess things up, and make sure it doesn't have access to any secrets.

wildpeaks 12 minutes ago | parent | prev | next [-]

This is why having a human in the loop isn't enough because they will cut corners and skip reviewing what they should review, then blame the tool when something goes wrong as a result.

chuckadams 7 minutes ago | parent [-]

A tool that pushes people into permissions fatigue is in fact the proper recipient of the blame. The tool in question here is the entire system though, including the OS with insufficient permission boundaries in userspace, not just the agent

qsxfthnkp2322 14 minutes ago | parent | prev [-]

I love it when Claude is dangerous