Remix.run Logo
bellowsgulch 3 hours ago

*Anthropic apologizes they got caught defending their moat by implementing invisible Claude Fable guardrails

simonw 3 hours ago | parent | next [-]

If by "got caught" you mean "published it in their system card paper".

(Admittedly it was buried pretty deep in that 300+ page PDF, but they did at least disclose it. If they hadn't I imagine it would have taken quite some time for the research community to figure out what was going on.)

afthonos 3 hours ago | parent | next [-]

It was in the announcement, too. I’m 99% sure they edited it after they changed their mind, because I knew about it from reading that, and never opened the model card.

skavi 2 hours ago | parent [-]

On the earliest web archive snapshot I can find [0], I do not see any mention of the safeguard/sabotage under discussion [1].

And to be clear, this isn't the safeguard where the model is explicitly downgraded to Opus, but rather where the Fable/Mythos model's "effectiveness" is transparently "limited" via "prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)".

[0]: https://web.archive.org/web/20260609173222/https://www.anthr...

[1]: https://simonwillison.net/2026/Jun/10/if-claude-fable-stops-...

ajyoon an hour ago | parent | prev | next [-]

I wasn't buried, it was on the third page after the ToC

bellowsgulch 2 hours ago | parent | prev [-]

Yes, I actually do mean that. I skimmed the system card. Them stating it openly, doing it, and being called out on it just doesn't have any meaningful difference.

They could have simply told people "we do not permit using Claude models to perform frontier AI research," which is defensible from a policy point of view. This particular usage of their products requires no deception, nor hiding information prevent abuse.

However, instead, they chose for some reason to publicly display a morally poor way to execute a reasonable business decision (preventing abuse, defending your business interests, etc.)

afthonos 3 hours ago | parent | prev | next [-]

They didn’t get caught, they explicitly said they would do that in the announcement. I think it was both bad and a weird idea, but it certainly wasn’t sneaky.

cyanydeez 3 hours ago | parent | prev [-]

is it a moat or just a way to implement the permanent underclass?