Remix.run Logo
stavros 6 hours ago

If K2 or GLM 5.2 are on par with Opus 4.8 I'll eat my hat. They're good, but they're not that good. Deepseek V4 Pro has been better than Sonnet for me, but the only model that comes close to or surpasses Opus 4.8 is GPT-5.5.

Aeolun 3 hours ago | parent | next [-]

GLM 5.2 is far better than deepseek V4. Seriously feels like I’m talking to a Claude model. Also burns tokens like one, so there is that. Deepseek is unbeatable on price/quality.

fjsoxjdnwk 5 hours ago | parent | prev [-]

Honestly just give it time. This stuff moves so fast next month the conversation will be different. For folks who don’t like the ID privacy issues, use Deepseek et al and it should be able to get the job done even if the experience takes a bit more wrangling.

The problem with the ID verification is that they can pair introspective conversations with ID. Either that bothers people or it doesn’t.

Main point: we can’t fret about current state models because the ID verification has future implications. Models will change and competition will catch up. Do what feels right in the long run not whether TODAYS model is better at Anthropic.

stavros 5 hours ago | parent [-]

I agree with this, my disagreement was strictly with saying that the current open models are as good as Opus.

chrsw 3 hours ago | parent [-]

They're not. And by the time they are Open AI and Anthropic will probably be onto the next thing.

Not sure what happened to Google in all this. They're falling out of the frontier race.

HDBaseT 3 hours ago | parent [-]

Both Anthropic and OpenAI don't want to continue training models indefinitely.

Anthropic CEO has expressed potentially slowing down on model training. There is little return for billions of dollars burnt for 1-2% increase on various benchmarks. These companies profit via inference.

Not to mention, the whole Fable being banned by the US Gov is a scary prospect for future models. What is the point of spending billions if its going to get blocked?

chrsw an hour ago | parent [-]

Of course this can't go on forever. Especially not on LLMs. But are we really close to the limits of what these LLMs can do? I'm not sure we are.

The difference between GPT-5/Opus 4 and GPT-5.5/Opus 4.8 is striking. For software development anyway, there's no comparison. And all this has happened in a year.

My assumption is there will be another 2-3 years of improvements ahead of us on LLMs alone. Through hardware upgrades, larger training runs, better data quality, better algorithms, etc.

Of course, by then these models will be quite expensive. Will my company pay for it? I don't know. I'm sure some people will though.