| ▲ | gck1 5 hours ago | ||||||||||||||||
I've always seen people complaining about model getting dumber just before the new one drops and always though this was confirmation bias. But today, several hours before the 4.7 release, opus 4.6 was acting like it was sonnet 2 or something from that era of models. It didn't think at all, it was very verbose, extremely fast, and it was just... dumb. So now I believe everyone who says models do get nerfed without any notification for whatever reasons Anthropic considers just. So my question is: what is the actual reason Anthropic lobotomizes the model when the new one is about to be dropped? | |||||||||||||||||
| ▲ | taylorfinley 4 hours ago | parent | next [-] | ||||||||||||||||
I've noticed this and thought about it as well, I have a few suspicions: Theory 1: Some increasingly-large split of inference compute is moving over to serving the new model for internal users (or partners that are trialing the next models). This results in less compute but the same increasing demand for the previous model. Providers may respond by using quantizations or distillations, compressing k/v store, tweaking parameters, and/or changing system prompts to try to use fewer tokens. Theory 2: Internal evals are obviously done using full strength models with internally-optimized system prompts. When models are shipped into production the system prompt will inherently need changes. Each time a problematic issue rises to the attention of the team, there is a solid chance it results in a new sentence or two added to the system prompt. These grow over time as bad shit happens with the model in the real world. But it doesn't even need to be a harmful case or bad bugged behavior of the model, even newer models with enhanced capabilities (e.g. mythos) may get protected against in prompts used in agent harnesses (CC) or as system prompts, resulting in a more and more complex system prompt. This has something like "cognitive burden" for the model, which diverges further and further from the eval. | |||||||||||||||||
| ▲ | jubilanti 5 hours ago | parent | prev [-] | ||||||||||||||||
> So my question is: what is the actual reason Anthropic lobotomizes the model when the new one is about to be dropped? You can only fit one version of a model in VRAM at a time. When you have a fixed compute capacity for staging and production, you can put all of that towards production most of the time. When you need to deploy to staging to run all the benchmarks and make sure everything works before deploying to prod, you have to take some machines off the prod stack and onto the staging stack, but since you haven't yet deployed the new model to prod, all your users are now flooding that smaller prod stack. So what everyone assumes is that they keep the same throughput with less compute by aggressively quantizing or other optimizations. When that isn't enough, you start getting first longer delays, then sporadic 500 errors, and then downtime. | |||||||||||||||||
| |||||||||||||||||