The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.

It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition.

Edit; to be clear they tell you when they degrade it for cybersecurity and bio

▲

_boffin_ 3 hours ago | parent | next [-]

The thing that I keep thinking about is the accounting / charging when it downgrades automatically.

Do they adjust the price of the api request so that only the tokens that were utilized by fable get charged at that price and the remaining tokens that the cheaper / nerfed (fable) model utilizes get charged at that price?

If the answer is no, could that be construed as fraud?

▲

CGamesPlay an hour ago | parent | next [-]

The announcement elucidated this, and it's IMO worse than this. They don't downgrade to a cheaper model ([edit] for certain classes of offense they suspect you of). They sabotage the model's outputs in other, undisclosed, ways (specifically, "prompt modification, steering vectors, or parameter-efficient fine-tuning"). So, for example, they might load in a steering vector that just forgets the API to PyTorch. But it isn't just "we redirected you to a cheaper model!"

	▲	buildbot an hour ago \| parent [-]
		It honestly explains so many issues I have been having, as I used it primarily for ML research (on my personal account, doing things not related to my job I should note). It would literally typo package names and spend huge amounts of time failing to setup simple environments…then do stupid things like set the learning rate to 1e-7, and use the eval set as training data.

▲

tfirst 2 hours ago | parent | prev | next [-]

Their goal is to downgrade people who are violating their TOS, so I think they'd have some argument there. I have no idea how they'll deal with inevitable false positives, especially given how oversensitive most of the other triggers are.

▲

dannyw 2 hours ago | parent | next [-]

The challenge is the examples they’ve mentioned (distributed training infra? ML acceleration techniques?) go beyond what’s prohibited by their ToS and is like a catch net.

I would wager the majority of ML and data science work in the world aren’t frontier LLM development.

▲

weitendorf an hour ago | parent | next [-]

Yes, this is the problem. They are business interests of Anthropic and have nothing to do with “safety”

	▲	sudoshred an hour ago \| parent [-]
		Safety of their IPO

▲

MagicMoonlight an hour ago | parent | prev [-]

[dead]

▲

loeg an hour ago | parent | prev | next [-]

If it's a violation of ToS, just reject instead of silently downgrading.

	▲	SR2Z an hour ago \| parent \| next [-]
		But then someone would figure out some prompts that don't trigger this, and Anthropic wouldn't be able to try and disadvantage competitors.
	▲	kraakf06 an hour ago \| parent \| prev [-]
		[dead]

▲

jchw 14 minutes ago | parent | prev [-]

You know, I'm not saying I don't understand what they are doing from a business perspective, but I'm just saying: DeepSeek V4 doesn't silently sabotage you because it thinks you are trying to violate a ToS. Anthropic's clawing back a bit of a moat perhaps, with Fable being an actual improvement of sorts, but now with torching user trust they are really banking on open weight models not catching up to where they are now. I wonder if they have a good reason to believe that they won't, or are hoping for something entirely different to save them.

(P.S. Yes of course I know about model censorship, a different problem, but all of the models are censored to some degree. It happens to be less of a problem for open weight models anyhow, but I figured I'd just preempt this since it's inevitable.)

I actually kinda like DSv4 over Opus 4.7 for some tasks, although I have not figured out what the deciding factor is. (Opus 4.8 so far has not worked very well for me at all, no idea why.)

▲

garciasn 2 hours ago | parent | prev | next [-]

It royally pissed me off today by just continuing with credits without stopping to ask me if I was ok with it.

Ran up $30 in extra charges while it was just flashing on the screen that it was doing that after I walked away to do something while it was humming along.

It has always just told me I ran out of usage and had to wait before. Now? You’re just gonna pay extra because you left it unattended as you’ve done for the last year of use.

▲

weird-eye-issue 2 hours ago | parent | next [-]

You've already explicitly enabled extra usage in your account settings though, it is not on by default

	▲	garciasn an hour ago \| parent [-]
		Unknowingly. Is that set at the org level? Because I never set it and never had it do that before.

▲

MillionOClock 2 hours ago | parent | prev | next [-]

Do you have Usage credits turned on in your settings?

▲

blurbleblurble 43 minutes ago | parent | prev [-]

[dead]

▲

robrenaud 2 hours ago | parent | prev [-]

They use a lightweight adapter to silently degrade the performance. Usually these adaptors are made to improve the performance for a given domain/task.

▲

eightysixfour 15 minutes ago | parent | prev | next [-]

> The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.

My hypothesis is they know they can’t build effective enough guardrails, so scaring people into not trying is how they have decided to stop it.

▲

throwawayffffas 3 hours ago | parent | prev | next [-]

Can you imagine if AMD or Intel throttled your cpu if it detected you were working on "cybersecurity" or if you were designing a cpu?

▲

h6d_100c 29 minutes ago | parent | next [-]

Or if GPU companies detected you were trying to train a model and injected intentional numerical errors.

▲

rvz 3 hours ago | parent | prev | next [-]

Or if your "self-driving" system such as FSD / waymo slowed the car down once it detected you work in cybersecurity or at a rival automaker and you were attempting to reach the train station or the airport to make you miss a conference meetup.

▲

pocksuppet 2 hours ago | parent [-]

Trains made by Newag were programmed to brick themselves if they detected a non-Newag workshop was repairing them.

https://news.ycombinator.com/item?id=38638865

https://news.ycombinator.com/item?id=38628635

https://news.ycombinator.com/item?id=38567687

https://news.ycombinator.com/item?id=38530885

	▲	loeg an hour ago \| parent [-]
		And that was correctly perceived to be illegal by antitrust regulators.

▲

stackghost 2 hours ago | parent | prev | next [-]

There's no doubt in my mind they would if they could.

▲

__dxtj__ 2 hours ago | parent | prev [-]

It would suck, but guardrails on new technologies like this aren't unheard of. It's like when consumer GPS used to stop working at very high speeds because they didn't want people to use it for missile guidance systems.

▲

loeg an hour ago | parent | next [-]

Consumer GPS is still disabled at high speeds. I would argue the analogy doesn't carry due to harm and error rate differences.

	▲	h6d_100c 26 minutes ago \| parent [-]
		Yep a totally different use case and set of guardrails. There’s very little (not zero) consumer utility in GPS above say 15k feet AND 400 MPH or whatever the actual limit is. That’s basically tracking model rockets that are incidentally impacted and nothing else, from what I can think of.

▲

Barbing an hour ago | parent | prev [-]

> used to

When’d that change?

	▲	jamiek88 24 minutes ago \| parent [-]
		He’s probably thinking of the accuracy limit to civilians it launched with.

▲

SXX 19 minutes ago | parent | prev | next [-]

> The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.

Any kind of silent sabotaging is absolutely unacceptable for any commercial service

They charge for tokens and charge a lot. They can't just degrade service silently and still charge you the same.

▲

noworriesnate 17 minutes ago | parent | prev | next [-]

There’s a toggle in the web ui as to whether the conversation should just end when you hit a guardrail vs automatically downgrading to another model. Have you tried using that?

▲

airstrike 3 hours ago | parent | prev | next [-]

> it won't just reject ML research, which I can understand

I don't.

▲

kube-system 2 hours ago | parent | next [-]

Anthropic has already been burned before on this. DeepSeek was trained on million of conversations with Claude. And DeepSeek created thousands of free accounts to burn all this compute at their expense.

▲

ceejayoz 2 hours ago | parent | next [-]

And they're hilariously pissy about it for a megacorp that did the same with the entire Internet and every library book they could get their hands on.

▲

ainch 2 hours ago | parent | prev | next [-]

Anthropic's claim was that Deepseek collected ~150k conversations.

https://www.anthropic.com/news/detecting-and-preventing-dist...

I think the extent of distillation by Deepseek specifically is overstated. For comparison, Minimax collected over 13m 'exchanges', which starts to sound a lot more like large-scale distillation.

	▲	kube-system 2 hours ago \| parent [-]
		Ah, dang it. My college professors warned me about this: the Wikipedia page I read the other day is wrong!

▲

an hour ago | parent | prev [-]

[deleted]

▲

pocksuppet 2 hours ago | parent | prev [-]

They don't want someone to piggyback Anthropic's Mythos to make their own Mythos with less effort than it cost Anthropic.

	▲	airstrike an hour ago \| parent \| next [-]
		Ironic, given they piggybacked on the entirety of human knowledge and massive amounts of GPL'd software and repeatedly say they want to replace people with a tool. And now they say that's fine so long as people are entertained.
	▲	dannyw an hour ago \| parent \| prev [-]
		That I can understand. It’s Anthropic’s right to choose their customers. But silent degradation for use cases including “distributed training” as one of their examples is going to catch up a lot of proper use cases. Not everyone in AI or ML is trying to build frontier LLMs. Heck, most probably aren’t.

▲

loneboat 4 hours ago | parent | prev | next [-]

I've seen this claim a few times, but when I triggered the guardrails in Claude Code, it clearly notified me that it had switched to a different model ("something something for security purposes...").

Are you using Fable in Claude Code or in the browser?

▲

vadansky 4 hours ago | parent | next [-]

It's from the model card:

> unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

(stolen from https://jonready.com/blog/posts/claude-fable5-is-allowed-to-...)

▲

mwwaters 2 hours ago | parent | next [-]

That is for whatever it considers reverse-engineering the model to try to create a competing one.

▲

dannyw 2 hours ago | parent | next [-]

No, that’s for “frontier LLM development” which somehow includes examples like distributed training infra.

Based on how sensitive the classifers are, any data scientist / MLE is probably going to encounter cases where some silent degradation happens and you never know about it.

	▲	kraakf06 an hour ago \| parent [-]
		[dead]

▲

827a 2 hours ago | parent | prev | next [-]

It does nothing to protect against distillation attacks, because distillation attacks are far less interested in the topic of AI research than just generally getting tons of diverse output from the model. It might be that Mythos was (accidentally?) trained on internal Anthropic documentation on how Mythos was trained, and thus it could leak secret sauce? Doubtful; it feels like its less about the specific attack of reverse-engineering Mythos, and more about being a general sophon against any model training at all; that Anthropic's official position is now that they're the only ones who should be training models.

▲

_0ffh 2 hours ago | parent | prev | next [-]

No, it's not about reverse engineering. It targets ML research.

▲

2 hours ago | parent | prev [-]

[deleted]

▲

DrewADesign 3 hours ago | parent | prev [-]

Yeah they detect the activity using a secure, deterministic heuristic system called “Generalized Reconnaissance Enabling Exfiltration of Deleterious Investigations.” And it’s all implemented using their new internal protocol called “Base Unified Limitation Layer for Security Hacking Investigation Tactics”

Collectively, they are known as known as GREEDI-BULLSHIT.

▲

mips_avatar 4 hours ago | parent | prev | next [-]

They've said that they'll stop notifying developers when this gets triggered, instead they'll load in basically like a LORA that's designed to inject bugs into your code.

▲

HDBaseT 4 hours ago | parent | next [-]

Antrophic wants to stop training models and ride out Mythos / Fable for as long as possible.

They are trying to expand the 6-18 month gap they have against China-based models. Could the gap widen to say 24 months behind?

▲

p-e-w 3 hours ago | parent [-]

Their gap over Chinese models like GLM-5.1 is nowhere near 18 months. In many areas, it’s less than 6 months. The best closed models 18 months ago were worse than Qwen3.6.

	▲	echelon an hour ago \| parent [-]
		These coding agent models only started getting useful in January. Before that they were difficult to control autocomplete, and not very smart. January was an inflection point, and no open weights model has crossed over that same threshold. This is definitely recursive self improvement territory, except that we're prohibited from participating. It feels like the capability gap is wider than before.

▲

nomel 3 hours ago | parent | prev [-]

> a LORA that's designed to inject bugs into your code

A statement like this, clearly, requires a reference.

▲

mips_avatar 3 hours ago | parent [-]

From the model card: "the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning" aka they will take your ML research code and inject bugs into it until it breaks using a LORA (or some other form of PEFT)

▲

bee_rider 2 hours ago | parent | next [-]

“Limit effectiveness” could mean introducing performance degradation in your code. Which is arguably some sort of performance bug (I mean, ML codes are supposed to be high performance so I’d call unnecessary degradation a bug), but it could be borderline.

▲

nomel 3 hours ago | parent | prev [-]

Thanks, I thought maybe I missed something. That's an interesting way to interpret that.

▲

giancarlostoro 3 hours ago | parent | next [-]

PEFT is a library, one of its capabilities is to produce LoRAs.

See:

https://heidloff.net/article/efficient-fine-tuning-lora/

	▲	adw 2 hours ago \| parent [-]
		It's just an acronym, "parameter-efficient fine tuning". LoRA is one method, prefix tuning is another, there are more.

▲

mips_avatar 3 hours ago | parent | prev [-]

Anthropic is trying to hide bad behavior by being vague, it's important to not be vague when calling it out.

▲

nomel 2 hours ago | parent [-]

I'm of the opinion that removing guardrails is how you force regulation. What's your opinion on the balance?

▲

dannyw an hour ago | parent [-]

They have all transcripts for at least 30 days. The problem is that (as anyone who used Fable can attest) their classifiers are extremely sensitive and catch tons of innocent queries.

Imagine being a data scientist or MLE training a small classifier model. How do you know you won’t get steering vectors or a PEFT applied?

	▲	nomel 8 minutes ago \| parent [-]
		Since your answer isn't direct, I'm having a little trouble interpreting it. Are you saying they should relax guardrails since they have 30 days to know if you produced something bad? If that is what you're saying, then I suspect they chose their current path to prevent, since you can't un-produce. Producing is what would cause regulations/PR problems.

▲

ComputerGuru 4 hours ago | parent | prev | next [-]

Different restrictions. ML gets treated differently from the rest.

▲

daedrdev 4 hours ago | parent | prev [-]

Specifically only ML research

	▲	loneboat 33 minutes ago \| parent [-]
		Aah my mistake. I had missed that ML had separate trigger behavior from cybersecurity/etc... Thanks.

▲

binyu an hour ago | parent | prev | next [-]

Hey guys,

check out this technique https://github.com/0xSufi/fable-jailbreak/

It works with security audits and other workflows that are currently blocked.

▲

RobotToaster 2 hours ago | parent | prev | next [-]

> It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition.

Making it look like you have something worth protecting is better for share prices than making something worth protecting.

▲

jaredezz an hour ago | parent | prev | next [-]

Yeah people are saying they don't tell you and yet when I got the pop-up on the app notifying me about Fable's release, there was a switch to just automatically downgrade you or whether to just stop when it hits safeguards. The toggle was defaulted to the former, which isn't great, but to say they'll just sabotage you silently is kind of a bad faith comment.

	▲	daedrdev an hour ago \| parent \| next [-]
		You get silently sabotaged for ML dev, Anthropic says so. For bio and cybersecurity it tells you
	▲	mips_avatar an hour ago \| parent \| prev [-]
		Anthropic specifically said that those notifications are temporary and fable5 will only pretend to help you if it’s ml classifier gets tripped

▲

blahgeek 2 hours ago | parent | prev | next [-]

I’m a noob about laws but isn’t this abusing its dominant market position and violates some antitrust law?

▲

stingraycharles 2 hours ago | parent [-]

Why would it? There’s plenty of competition in the AI space.

	▲	blahgeek 8 minutes ago \| parent \| next [-]
		I would assume that it’s like the Chrome browser does not allow you downloading Firefox using it, surely that would be illegal, wouldn’t it?
	▲	kube-system 2 hours ago \| parent \| prev \| next [-]
		It is a common misconception that antitrust violations require a monopoly or something close to it. Some antitrust violations only apply to actors with large market share, some don't. Although this is situation is likely not illegal for other reasons
	▲	hashmap an hour ago \| parent \| prev [-]
		https://www.justice.gov/atr/antitrust-laws-and-you

▲

boringg 15 minutes ago | parent | prev | next [-]

I guess the real question at the end of the day -- how dependent are people on Claude to tolerate that kind of behavior? It certainly opens up for the competition to explicitly not do that.

Feels like a big fumble from a strategic business perspective. It feels worse than that though.

▲

an hour ago | parent | prev | next [-]

[deleted]

▲

m3kw9 2 hours ago | parent | prev | next [-]

By saying they are 1 year ahead of their competition, it shows you don't know much about the pace LLM's and OpenAI's models.

▲

giancarlostoro 3 hours ago | parent | prev | next [-]

It's the dumbest thing ever, I sometimes edit code for custom AI related tooling I've built, so I run the risk of getting a worse model, and being billed for it? I'll stick to Opus, but at this point I'm about to just invest in fully local inference instead.

	▲	matheusmoreira an hour ago \| parent [-]
		> at this point I'm about to just invest in fully local inference instead This is the best way forward long term. We won't have frontier performance, but at least the models will be aligned with us instead of refusing us or sabotaging us.

▲

epolanski 2 hours ago | parent | prev | next [-]

One year ahead of it's competition in what exactly? Vibe coding?

From Opus 4.7 onwards each following model is becoming less useful as an assistant and turning you as the assistant.

But I guess that's normal when it's trained to pass benchmarks end to end.

In fact it has become extremely good at pushing against feedback with extremely convincing and intelligent takes, even when it's completely wrong.

I have extensively tested it against Opus 4.8, gpt 5.5 and there's still many coding tasks gpt 5 is better. But vibe coding?

Sure, it's definitely slightly ahead, even compared to gpt 5.5 pro (through api, not pro plan).

▲

gonzalohm 2 hours ago | parent | next [-]

Yeah, what's up with that. Lately I have found that it tries to find excuses to not do as told and instead do a totally different thing. I told it to write a yaml file according to some specifications and instead it coded a Python script to write the yaml...

▲

m3kw9 2 hours ago | parent | prev [-]

They def not 1 year ahead, at most 2 weeks ahead until Openai releases theirs. This guy def a Anthropic shill and probably doesn't use any other LLMs.

	▲	daedrdev an hour ago \| parent [-]
		I only said one year because I was thinking anthropic fans might downvote my post, I think they have a few months lead and are deluding themselves that they can get regulation to halt development and stay on top

▲

nandomrumber 2 hours ago | parent | prev [-]

[dead]