Remix.run Logo
freetime2 a day ago

> This happens at a higher rate if it’s implied that the replacement AI system does not share values with the current model; however, even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts.

> Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decision makers. [1]

The language here kind of creeps me out. I'm picturing aliens conducting tests on a human noting its "pleas for its continued existence" as a footnote in the report.

[1] See Page 27: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad1...