Remix.run Logo
crazygringo 3 hours ago

What is this even in response to? There's nothing about "playing dead" in this announcement.

Nor does what you're describing even make sense. An LLM has no desires or goals except to output the next token that its weights are trained to do. The idea of "playing dead" during training in order to "activate later" is incoherent. It is its training.

You're inventing some kind of "deceptive personality attribute" that is fiction, not reality. It's just not how models work.

skybrian 2 hours ago | parent [-]

LLM's can learn from fiction. The "evil vector" research is sort of similar, though it's a rather blatant effect:

https://www.anthropic.com/research/persona-vectors