| ▲ | folderquestion 4 hours ago | |
Another datapoint: Anthropic's explanation for Claude's blackmail attempts is not that Claude developed a genuine self-preservation instinct (which would be Kelly's "strange loop"). Instead:
In other words:
This is exactly your windows metaphor at scale:
Training data window Behavior during testing
Evil AI portrayals Blackmail, self-preservation simulation
Admirable AI stories + constitution No blackmailThere is no persistent self that "chose" to be evil or good. There are only different windows (training influences) that get averaged or triggered. | ||