| ▲ | _boffin_ 3 hours ago |
| The thing that I keep thinking about is the accounting / charging when it downgrades automatically. Do they adjust the price of the api request so that only the tokens that were utilized by fable get charged at that price and the remaining tokens that the cheaper / nerfed (fable) model utilizes get charged at that price? If the answer is no, could that be construed as fraud? |
|
| ▲ | CGamesPlay an hour ago | parent | next [-] |
| The announcement elucidated this, and it's IMO worse than this. They don't downgrade to a cheaper model ([edit] for certain classes of offense they suspect you of). They sabotage the model's outputs in other, undisclosed, ways (specifically, "prompt modification, steering vectors, or parameter-efficient fine-tuning"). So, for example, they might load in a steering vector that just forgets the API to PyTorch. But it isn't just "we redirected you to a cheaper model!" |
| |
| ▲ | buildbot an hour ago | parent [-] | | It honestly explains so many issues I have been having, as I used it primarily for ML research (on my personal account, doing things not related to my job I should note). It would literally typo package names and spend huge amounts of time failing to setup simple environments…then do stupid things like set the learning rate to 1e-7, and use the eval set as training data. |
|
|
| ▲ | tfirst 2 hours ago | parent | prev | next [-] |
| Their goal is to downgrade people who are violating their TOS, so I think they'd have some argument there. I have no idea how they'll deal with inevitable false positives, especially given how oversensitive most of the other triggers are. |
| |
| ▲ | dannyw 2 hours ago | parent | next [-] | | The challenge is the examples they’ve mentioned (distributed training infra? ML acceleration techniques?) go beyond what’s prohibited by their ToS and is like a catch net. I would wager the majority of ML and data science work in the world aren’t frontier LLM development. | | | |
| ▲ | loeg an hour ago | parent | prev | next [-] | | If it's a violation of ToS, just reject instead of silently downgrading. | | |
| ▲ | SR2Z an hour ago | parent | next [-] | | But then someone would figure out some prompts that don't trigger this, and Anthropic wouldn't be able to try and disadvantage competitors. | |
| ▲ | kraakf06 an hour ago | parent | prev [-] | | [dead] |
| |
| ▲ | jchw 11 minutes ago | parent | prev [-] | | You know, I'm not saying I don't understand what they are doing from a business perspective, but I'm just saying: DeepSeek V4 doesn't silently sabotage you because it thinks you are trying to violate a ToS. Anthropic's clawing back a bit of a moat perhaps, with Fable being an actual improvement of sorts, but now with torching user trust they are really banking on open weight models not catching up to where they are now. I wonder if they have a good reason to believe that they won't, or are hoping for something entirely different to save them. (P.S. Yes of course I know about model censorship, a different problem, but all of the models are censored to some degree. It happens to be less of a problem for open weight models anyhow, but I figured I'd just preempt this since it's inevitable.) I actually kinda like DSv4 over Opus 4.7 for some tasks, although I have not figured out what the deciding factor is. (Opus 4.8 so far has not worked very well for me at all, no idea why.) |
|
|
| ▲ | garciasn 2 hours ago | parent | prev | next [-] |
| It royally pissed me off today by just continuing with credits without stopping to ask me if I was ok with it. Ran up $30 in extra charges while it was just flashing on the screen that it was doing that after I walked away to do something while it was humming along. It has always just told me I ran out of usage and had to wait before. Now? You’re just gonna pay extra because you left it unattended as you’ve done for the last year of use. |
| |
|
| ▲ | robrenaud 2 hours ago | parent | prev [-] |
| They use a lightweight adapter to silently degrade the performance. Usually these adaptors are made to improve the performance for a given domain/task. |