|
| ▲ | margalabargala 2 months ago | parent | next [-] |
| That people allow these agents to just run arbitrary commands against their primary install is wild. Part of this is the tool's fault. Anything like that should be done in a chroot. Anything less is basically "twitch plays terminal" on your machine. |
| |
| ▲ | serf 2 months ago | parent | next [-] | | a large part of the benefit to an agentic ai is that it can coordinate tests that it automatically wrote on an existing code base, a lot of time the only way to get decent answers out of something like that is to let it run as bare metal as it can. I run cursor and the accompanying agents in a snapshot'd VM for this purpose. It's not much different than what you suggest, but the layer of abstraction is far enough for admin-privileged app testing, an unfortunate reality for certain personal projects. I haven't had a cursor install nuke itself yet, but I have had one fiddling in a parent folder it shouldn't have been able to with workspace protection on.. | |
| ▲ | tough 2 months ago | parent | prev [-] | | codex at least has limitations on what folders can operate. |
|
|
| ▲ | TechDebtDevin 2 months ago | parent | prev [-] |
| This is what happened. I was testing claude 4 and asked it to create a simple 1K LOC fyne android app. I have my repos stored outside of my linux user so the work it created was preserved. It essentially created a bash file that cd ~ && rm -rf / . All settings reset and documents/downloads disappeared lmfao. I don't ever really use my OS as primary storage, and any config or file of importance is backed up twice so it wasn't a big deal, but it was quite perplexing for a sec. |
| |
| ▲ | tough 2 months ago | parent [-] | | if you think deeply about it, its one kind of harakiri as an AI to remove the whole system you're operating on. Yeah Claude 4 can go too far some times |
|