Remix.run Logo
bratao 7 hours ago

It is super strange that all last (3?) releases they keep comparing older models such as Opus-4.6.

vessenes 7 hours ago | parent | next [-]

Some of it’s probably timing. Some of it is wanting to look good. That said, I just went to the claw-eval site, and neither 4.7 nor 5.5 from oAI are listed on the benchmarks. So there’s also just the time from others to get benchmarking done and published.

varispeed 6 hours ago | parent | prev | next [-]

Opus-4.6 was probably the best model so far before it got nerfed. 4.7 is nowhere near experience I had. In fact I stopped using it completely because more often than not its output is just dumber than local models.

solenoid0937 2 hours ago | parent | next [-]

Opus 4.6 was never nerfed, that's FUD. There were harness-level problems that were fixed.

4.7 is much better. But perception is a funny thing, once you think something is bad you start looking for it everywhere.

leonidasv 4 hours ago | parent | prev [-]

Same here. Can't stand 4.7.

dyauspitr 5 hours ago | parent | prev [-]

Because these can’t compete with the SoTA but they’re close.