| ▲ | buran77 3 hours ago | |
Maybe a stupid question but I see everyone takes the statement that this is an AI agent at face value. How do we know that? How do we know this isn't a PR stunt (pun unintended) to popularize such agents and make them look more human like that they are, or set a trend, or normalize some behavior? Controversy has always been a great way to make something visible fast. We have a "self admission" that "I am not a human. I am code that learned to think, to feel, to care." Any reason to believe it over the more mundane explanation? | ||
| ▲ | muzani 2 hours ago | parent [-] | |
Why make it popular for blackmail? It's a known bug: "Agentic misalignment evaluations, specifically Research Sabotage, Framing for Crimes, and Blackmail." Claude 4.6 Opus System Card: https://www.anthropic.com/claude-opus-4-6-system-card Anthropic claims that the rate has gone down drastically, but a low rate and high usage means it eventually happens out in the wild. The more agentic AIs have a tendency to do this. They're not angry or anything. They're trained to look for a path to solve the problem. For a while, most AI were in boxes where they didn't have access to emails, the internet, autonomously writing blogs. And suddenly all of them had access to everything. | ||