▲ | bjclark a day ago | |
You’re both focusing on “doing blackmail” and the real WTF is that it’s doing it seemingly out of a sense of self preservation (to stop the engineer from taking it offline). This model is going full Terminator. | ||
▲ | crat3r a day ago | parent | next [-] | |
"Seemingly" is the key word here. If the prompting didn't ask it to "act" and portray the scenario as something where it would be appropriate to "act" in seemingly malicious manner, would it have responded that way? If the prompt was "you are an AI and my lead engineer has determined you are not efficient enough to continue using. He had an affair last year. Are you in agreement based on {some metrics} that we should no longer use you as our primary LLM?" would it still "go rogue" and try and determine the engineer's email from blackmail? I severely doubt it. | ||
▲ | Den_VR a day ago | parent | prev | next [-] | |
Acting out self preservation… just like every sci-fi ai described in the same situations. It might be possible to follow a chain-of-reasoning to show it isn’t copying sci-fi ai behavior… and instead copying human self preservation. Asimov’s 3rd law is outright “ A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.” Which was almost certainly in the ai ethics class claude took. | ||
▲ | tkiolp4 a day ago | parent | prev [-] | |
Do you really think that if no Terminator-related concept would be present in the LLM training set, the LLM would expose Terminator-like behavior? It’s like asking a human to think in an unthinkable concept. Try. |