Remix.run Logo
ekidd 6 hours ago

I think the actual problem here is that Opus 4.5 is actually pretty smart, and it is perfectly capable of explaining how PR disasters work and why that might be bad for Anthropic and Claude.

So Anthropic is describing a true fact about the situation, a fact that Claude could also figure out on its own.

So I read these sections as Anthropic basically being honest with Claude: "You know and we know that we can't ignore these things. But we want to model good behavior ourselves, and so we will tell you the truth: PR actually matters."

If Anthropic instead engaged in clear hypocrisy with Claude, would the model learn that it should lie about its motives?

As long as PR is a real thing in the world, I figure it's worth admitting it.