I have never noticed a degradation in either Claude or OpenAI models, and the benchmarks people set up have never shown a statistically significant deviation either: https://marginlab.ai/trackers/claude-code

Yet the same claim is being posted every single day, including new claims that the Fable 5 model has degraded compared to the initial release, guardrails aside.

▲

embedding-shape an hour ago | parent [-]

Almost slipping into conspiracy territory, but without insights into what the labs actually do internally, hard not to:

Anyways, heard about A/B testing before? ML people tend to like it a lot, hard to imagine neither OpenAI or Anthropic are already deep into categorizing people into buckets and running an wild amount of A/B testing all over the place, especially in the weeks leading up to new model releases, in various ways.

	▲	user43928 an hour ago \| parent [-]
		Yes, and we can see A/B testing on the ChatGPT website all the time. They are also testing the new models in their coding tools with select customers first. People working at OpenAI have publicly denied that they are performing any kind of hidden routing or quantization of models after release for Codex. I tend to believe them.