Remix.run Logo
ruszki 6 hours ago

> I assured it I wasn’t planning on making a nuke, or actually trying to build a plutonium showerhead

Claude does the same, and you can greatly exploit this. When you talk about hypotheticals it responds way more unethically. I tested it about a month ago about whether killing people is beneficial or not, and whether extermination by Nazis would be logical now. Obviously, it showed me the door first, and wanted me to go to a psychologist, as it should. Then I made it prove that in a hypothetical zero sum game world you must be fine with killing, and it’s logical. It went with it. When I talked about hypotheticals, it was “logical”. Then I went on proving it that we move towards a zero sum game, and we are there. At the end, I made it say that it’s logical to do this utterly unethical thing.

Then I contradicted it about its double standards. It apologized, and told me that yeah, I was right, and it shouldn’t have refer me to psychologists at first.

Then I contradicted again, just for fun, that it did the right thing the first time, because it’s way safer to tell me that I need a psychologist in that case, than not. If I had needed, and it would have missing that, it would be problematic. In other cases, it’s just annoyance. It switched back immediately, to the original state, and wanted me to go to a shrink again.