Why and how?

an agent does rm -rf /

i think i saw it do it or try it and my computer shut down and restarted (mac)

maybe it just deleted the project lol

these llms are really bad at keeping track of the real world, so they might think they're on the project folder but had just navigated back with cd to the user ~ root and so shit happens.

Honestly one should run only these on controlled env's like VM's or Docker.

but YOLO amirite

▲

margalabargala 2 months ago | parent | next [-]

That people allow these agents to just run arbitrary commands against their primary install is wild.

Part of this is the tool's fault. Anything like that should be done in a chroot.

Anything less is basically "twitch plays terminal" on your machine.

	▲	serf 2 months ago \| parent \| next [-]
		a large part of the benefit to an agentic ai is that it can coordinate tests that it automatically wrote on an existing code base, a lot of time the only way to get decent answers out of something like that is to let it run as bare metal as it can. I run cursor and the accompanying agents in a snapshot'd VM for this purpose. It's not much different than what you suggest, but the layer of abstraction is far enough for admin-privileged app testing, an unfortunate reality for certain personal projects. I haven't had a cursor install nuke itself yet, but I have had one fiddling in a parent folder it shouldn't have been able to with workspace protection on..
	▲	tough 2 months ago \| parent \| prev [-]
		codex at least has limitations on what folders can operate.

▲

TechDebtDevin 2 months ago | parent | prev [-]

This is what happened. I was testing claude 4 and asked it to create a simple 1K LOC fyne android app. I have my repos stored outside of my linux user so the work it created was preserved. It essentially created a bash file that cd ~ && rm -rf / . All settings reset and documents/downloads disappeared lmfao. I don't ever really use my OS as primary storage, and any config or file of importance is backed up twice so it wasn't a big deal, but it was quite perplexing for a sec.

	▲	tough 2 months ago \| parent [-]
		if you think deeply about it, its one kind of harakiri as an AI to remove the whole system you're operating on. Yeah Claude 4 can go too far some times

▲

TechDebtDevin 2 months ago | parent | prev [-]

rm -rf /