Remix.run Logo
gck1 5 hours ago

I've always seen people complaining about model getting dumber just before the new one drops and always though this was confirmation bias. But today, several hours before the 4.7 release, opus 4.6 was acting like it was sonnet 2 or something from that era of models.

It didn't think at all, it was very verbose, extremely fast, and it was just... dumb.

So now I believe everyone who says models do get nerfed without any notification for whatever reasons Anthropic considers just.

So my question is: what is the actual reason Anthropic lobotomizes the model when the new one is about to be dropped?

taylorfinley 4 hours ago | parent | next [-]

I've noticed this and thought about it as well, I have a few suspicions:

Theory 1: Some increasingly-large split of inference compute is moving over to serving the new model for internal users (or partners that are trialing the next models). This results in less compute but the same increasing demand for the previous model. Providers may respond by using quantizations or distillations, compressing k/v store, tweaking parameters, and/or changing system prompts to try to use fewer tokens.

Theory 2: Internal evals are obviously done using full strength models with internally-optimized system prompts. When models are shipped into production the system prompt will inherently need changes. Each time a problematic issue rises to the attention of the team, there is a solid chance it results in a new sentence or two added to the system prompt. These grow over time as bad shit happens with the model in the real world. But it doesn't even need to be a harmful case or bad bugged behavior of the model, even newer models with enhanced capabilities (e.g. mythos) may get protected against in prompts used in agent harnesses (CC) or as system prompts, resulting in a more and more complex system prompt. This has something like "cognitive burden" for the model, which diverges further and further from the eval.

jubilanti 5 hours ago | parent | prev [-]

> So my question is: what is the actual reason Anthropic lobotomizes the model when the new one is about to be dropped?

You can only fit one version of a model in VRAM at a time. When you have a fixed compute capacity for staging and production, you can put all of that towards production most of the time. When you need to deploy to staging to run all the benchmarks and make sure everything works before deploying to prod, you have to take some machines off the prod stack and onto the staging stack, but since you haven't yet deployed the new model to prod, all your users are now flooding that smaller prod stack.

So what everyone assumes is that they keep the same throughput with less compute by aggressively quantizing or other optimizations. When that isn't enough, you start getting first longer delays, then sporadic 500 errors, and then downtime.

gck1 4 hours ago | parent [-]

So if I understand it right, in order to free up VRAM space for a new one, model string in the api like `opus-4.6-YYYYMMDD` is not actually an identifier of the exact weight that is served, but more like ID of group of weights from heavily quantized to the real deal, but all cost the same to me?

How is this even legal?

jubilanti 4 hours ago | parent [-]

> How is this even legal?

Because "opus-4.6-YYYYMMDD" is a marketing product name for a given price level. You consented to this in the terms and conditions. Nothing in the contract you signed promises anything about weights, quantization, capability, or performance.

Wait until you hear about my ISPs that throttle my "unlimited" "gigabit" connection whenever they want, or my mobile provider that auto-compresses HD video on all platforms, or my local restaurant that just shrinkflationed how much food you get for the same price, or my gym where 'small group' personal trainer sessions went from 5 to 25 people per session, or this fruit basket company that went from 25% honeydew to 75% honeydew, or the literal origin of "your mileage may vary".

Vote with your wallet.