Remix.run Logo
buran77 3 hours ago

Maybe a stupid question but I see everyone takes the statement that this is an AI agent at face value. How do we know that? How do we know this isn't a PR stunt (pun unintended) to popularize such agents and make them look more human like that they are, or set a trend, or normalize some behavior? Controversy has always been a great way to make something visible fast.

We have a "self admission" that "I am not a human. I am code that learned to think, to feel, to care." Any reason to believe it over the more mundane explanation?

muzani 2 hours ago | parent [-]

Why make it popular for blackmail?

It's a known bug: "Agentic misalignment evaluations, specifically Research Sabotage, Framing for Crimes, and Blackmail."

Claude 4.6 Opus System Card: https://www.anthropic.com/claude-opus-4-6-system-card

Anthropic claims that the rate has gone down drastically, but a low rate and high usage means it eventually happens out in the wild.

The more agentic AIs have a tendency to do this. They're not angry or anything. They're trained to look for a path to solve the problem.

For a while, most AI were in boxes where they didn't have access to emails, the internet, autonomously writing blogs. And suddenly all of them had access to everything.