Remix.run Logo
embedding-shape 2 hours ago

> especially in the light of GPT 5.6 very likely coming out next week

Finally have an explanation why GPT 5.5 xhigh felt dumber and dumber these last few weeks, always the same thing when a new model release is about to come out...

toxik an hour ago | parent [-]

Opus has been extremely stupid recently, reckon that's because Fable needs to look appealing?

user43928 38 minutes ago | parent [-]

I have never noticed a degradation in either Claude or OpenAI models, and the benchmarks people set up have never shown a statistically significant deviation either: https://marginlab.ai/trackers/claude-code

Yet the same claim is being posted every single day, including new claims that the Fable 5 model has degraded compared to the initial release, guardrails aside.

embedding-shape 36 minutes ago | parent [-]

Almost slipping into conspiracy territory, but without insights into what the labs actually do internally, hard not to:

Anyways, heard about A/B testing before? ML people tend to like it a lot, hard to imagine neither OpenAI or Anthropic are already deep into categorizing people into buckets and running an wild amount of A/B testing all over the place, especially in the weeks leading up to new model releases, in various ways.

user43928 15 minutes ago | parent [-]

Yes, and we can see A/B testing on the ChatGPT website all the time.

They are also testing the new models in their coding tools with select customers first.

People working at OpenAI have publicly denied that they are performing any kind of hidden routing or quantization of models after release for Codex. I tend to believe them.