Remix.run Logo
doctoboggan 2 days ago

Is it honestly better than Opus 4.6 or just benchmaxxed? Have you done any coding with an agent harness using it?

If its coding abilities are better than Claude Code with Opus 4.6 then I will definitely be switching to this model.

bokkies 2 days ago | parent | next [-]

Apparently glm5.1 and qwen coder latest is as good as opus 4.6 on benchmarks. So I tried both seriously for a week (glm Pro using CC) and qwen using qwen companion. Thought I could save $80 a month. Unfortunately after 2 days I had switched back to Max. The speed (slower on both although qwen is much faster) and errors (stupid layout mistakes, inserting 2 footers then refusing to remove one, not seeing obvious problems in screenshots & major f-ups of functionality), not being able to view URLs properly, etc. I'll give deepseek a go but I suspect it will be similar. The model is only half the story. Also been testing gpt5.4 with codex and it is very almost as good as CC... better on long running tasks running in background. Not keen on ChatGPT codex 'personality' so will stick to CC for the most part.

madagang 2 days ago | parent | prev | next [-]

Their Chinese announcement says that, based on internal employee testing, it is not as good as Opus 4.6 Thinking, but is slightly better than Opus 4.6 without Thinking enabled.

mchusma 2 days ago | parent | next [-]

I appreciate this, makes me trust it more than benchmarks.

ibic 2 days ago | parent | prev | next [-]

In case people wonder where the announcement is (you can easily translate it via browser if you don't read Chinese): https://mp.weixin.qq.com/s/8bxXqS2R8Fx5-1TLDBiEDg

It's still a "preview" version atm.

deaux 2 days ago | parent | prev | next [-]

That's super interesting, isn't Deepseek in China banned from using Anthropic models? Yet here they're comparing it in terms of internal employee testing.

computably 2 days ago | parent | next [-]

> That's super interesting, isn't Deepseek in China banned from using Anthropic models? Yet here they're comparing it in terms of internal employee testing.

I don't see why Deepseek would care to respect Anthropic's ToS, even if just to pretend. It's not like Anthropic could file and win a lawsuit in China, nor would the US likely ban Deepseek. And even if the US gov would've considered it, Anthropic is on their shitlist.

renticulous 2 days ago | parent | prev [-]

They use VPN to access. Even Google Deepmind uses Anthropic. There was a fight within Google as to why only DeepMind is allowed to Claude while rest of the Google can't.

anentropic 2 days ago | parent | prev [-]

Who uses Opus without thinking though...?

2 days ago | parent | prev [-]
[deleted]