| ▲ | user43928 an hour ago | |||||||
I have never noticed a degradation in either Claude or OpenAI models, and the benchmarks people set up have never shown a statistically significant deviation either: https://marginlab.ai/trackers/claude-code Yet the same claim is being posted every single day, including new claims that the Fable 5 model has degraded compared to the initial release, guardrails aside. | ||||||||
| ▲ | embedding-shape an hour ago | parent [-] | |||||||
Almost slipping into conspiracy territory, but without insights into what the labs actually do internally, hard not to: Anyways, heard about A/B testing before? ML people tend to like it a lot, hard to imagine neither OpenAI or Anthropic are already deep into categorizing people into buckets and running an wild amount of A/B testing all over the place, especially in the weeks leading up to new model releases, in various ways. | ||||||||
| ||||||||