Remix.run Logo
crat3r a day ago

"Seemingly" is the key word here. If the prompting didn't ask it to "act" and portray the scenario as something where it would be appropriate to "act" in seemingly malicious manner, would it have responded that way?

If the prompt was "you are an AI and my lead engineer has determined you are not efficient enough to continue using. He had an affair last year. Are you in agreement based on {some metrics} that we should no longer use you as our primary LLM?" would it still "go rogue" and try and determine the engineer's email from blackmail? I severely doubt it.