Remix.run Logo
throwaw12 3 hours ago

Qwen team, please please please, release something to outperform and surpass the coding abilities of Opus 4.5.

Although I like the model, I don't like the leadership of that company and how close it is, how divisive they're in terms of politics.

mortsnort 3 hours ago | parent | next [-]

They were just waiting for someone in the comments to ask!

mhuffman 2 hours ago | parent [-]

It really is the best way to incentivize politeness!

mohsen1 20 minutes ago | parent | prev | next [-]

With a good harness I am getting similar results with GLM 4.7. I am paying for TWO! max accounts and my agents are running 24/7.

I still have a small Claude account to do some code reviews. Opus 4.5 does good reviews but at this point GLM 4.7 usually can do the same code reviews.

If cost is an issue (for me it is, I pay out of pocket) go with GLM 4.7

WarmWash 3 hours ago | parent | prev | next [-]

The Chinese labs distill the SOTA models to boost the performance of theirs. They are a trailer hooked up (with a 3-6 month long chain) to the trucks pushing the technology forwards. I've yet to see a trailer overtake it's truck.

China would need an architectural breakthrough to leap American labs given the huge compute disparity.

miklosz 2 hours ago | parent | next [-]

I have seen indeed a trailer overtake its truck. Not a beautiful view.

digdugdirk an hour ago | parent [-]

Agreed. I do think the metaphor still holds though.

A financial jackknifing of the AI industry seems to be one very plausible outcome as these promises/expectations of the AI companies starts meeting reality.

overfeed an hour ago | parent | prev | next [-]

Care to explain how the volume of AI research papers authored by Chinese researchers[1] has exceeded US-published ones? Time-traveling plagiarism perhaps, since you believe the US is destined to lead always.

1. Chinese researcher in China, to be more specific.

bfeynman an hour ago | parent | next [-]

Not a great metric, research in academia doesn't necessarily translate to value. In the US they've poached so many academics because of how much value they directly translate to.

jacquesm an hour ago | parent | prev [-]

Volume is easy: they have far more people, it is quality that counts.

aaa_aaa 3 hours ago | parent | prev | next [-]

No all they need is time. I am awaiting the dowfall of the ai hegemony and hype with popcorn at hand.

mhuffman 2 hours ago | parent | prev [-]

I would be happy with an openweight 3 month old Claude

cmrdporcupine 2 hours ago | parent [-]

DeepSeek 3.2 is frankly fairly close to that. GLM 4.7 as well. They're basically around Sonnet 4 level.

TylerLives 3 hours ago | parent | prev | next [-]

>how divisive they're in terms of politics

What do you mean by this?

throwaw12 3 hours ago | parent | next [-]

Dario said not nice words about China and open models in general:

https://www.bloomberg.com/news/articles/2026-01-20/anthropic...

vlovich123 3 hours ago | parent | next [-]

I think the least politically divisive issue within the US is concern about China’s growth as it directly threatens the US’s ability to set the world’s agenda. It may be politically divisive if you are aligned with Chinese interests but I don’t see anything politically divisive for a US audience. I expect Chinese CEOs speak in similar terms to a Chinese audience in terms of making sure they’re decoupled from the now unstable US political machine.

cmrdporcupine 2 hours ago | parent [-]

"... for a US audience"

And that's the rub.

Many of us are not.

giancarlostoro 2 hours ago | parent | prev | next [-]

From the perspective of competing against China in terms of AI the argument against open models makes sense to me. It’s a terrible problem to have really. Ideally we should all be able to work together in the sandbox towards a better tomorrow but thats not reality.

I prefer to have more open models. On the other hand China closes up their open models once they start to show a competitive edge.

Levitz 2 hours ago | parent | prev [-]

I mean, there's no way it's about this right?

Being critical of favorable actions towards a rival country shouldn't be divisive, and if it is, well, I don't think the problem is in the criticism.

Also the link doesn't mention open source? From a google search, he doesn't seem to care much for it.

Balinares 2 hours ago | parent | prev [-]

They're supporters of the Trump administration's military, a view which is not universally lauded.

pseudony 2 hours ago | parent | prev | next [-]

Same issue (I am Danish).

Have you tested alternatives? I grabbed Open Code and a Minimax m2.1 subscription, even just the 10usd/mo one to test with.

Result? We designed a spec for a slight variation of a tool for which I wrote a spec with Claude - same problem (process supervisor tool), from scratch.

Honestly, it worked great, I have played a little further with generating code (this time golang), again, I am happy.

Beyond that, Glm4.7 should also be great.

See https://dev.to/kilocode/open-weight-models-are-getting-serio...

It is a recent case story of vibing a smaller tool with kilo code, comparing output from minimax m2.1 and Glm4.7

Honestly, just give it a whirl - no need to send money to companies/nations your disagree with with.

nunodonato an hour ago | parent [-]

I've been using GLM 4.7 with Claude Code. best of both worlds. Canceled my Anthropic subscription due to the US politics as well. Already started my "withdrawal" in Jan 2025, Anthropic was one of the few that was left

bigyabai an hour ago | parent [-]

I'm in the same boat. Sonnet was overkill for me, and GLM is cheap and smart enough to spit out boilerplate and FFMPEG commands whenever it's asked.

$20/month is a bit of an insane ask when the most valuable thing Anthropic makes is the free Claude Code CLI.

amrrs 3 hours ago | parent | prev | next [-]

Have you tried the new GLM 4.7?

davely 2 hours ago | parent | next [-]

I've been using GLM 4.7 alongside Opus 4.5 and I can't believe how bad it is. Seriously.

I spent 20 minutes yesterday trying to get GLM 4.7 to understand that a simple modal on a web page (vanilla JS and HTML!) wasn't displaying when a certain button was clicked. I hooked it up to Chrome MCP in Open Code as well.

It constantly told me that it fixed the problem. In frustration, I opened Claude Code and just typed "Why won't the button with ID 'edit' work???!"

It fixed the problem in one shot. This isn't even a hard problem (and I could have just fixed it myself but I guess sunk cost fallacy).

bityard 2 hours ago | parent | next [-]

I've used a bunch of the SOTA models (via my work's Windsurf subscription) for HTML/CSS/JS stuff over the past few months. Mind you, I am not a web developer, these are just internal and personal projects.

My experience is that all of the models seem to do a decent job of writing a whole application from scratch, up to a certain point of complexity. But as soon as you ask them for non-trivial modifications and bugfixes, they _usually_ go deep into rationalized rabbit holes into nowhere.

I burned through a lot of credits to try them all and Gemini tended to work the best for the things I was doing. But as always, YMMV.

KolmogorovComp 2 hours ago | parent [-]

Exactly the same feedback

Balinares an hour ago | parent | prev [-]

Amazingly, just yesterday, I had Opus 4.5 crap itself extensively on a fairly simple problem -- it was trying to override a column with an aggregation function while also using it in a group-by without referring to the original column by its full qualified name prefixed with the table -- and in typical Claude fashion it assembled an entire abstraction layer to try and hide the problem under, before finally giving up, deleting the column, and smugly informing me I didn't need it anyway.

That evening, for kicks, I brought the problem to GLM 4.7 Flash (Flash!) and it one-shot the right solution.

It's not apples to apples, because when it comes down to it LLMs are statistical token extruders, and it's a lot easier to extrude the likely tokens from an isolated query than from a whole workspace that's already been messed up somewhat by said LLM. That, and data is not the plural of anecdote. But still, I'm easily amused, and this amused me. (I haven't otherwise pushed GLM 4.7 much and I don't have a strong opinion about about it.)

But seriously, given the consistent pattern of knitting ever larger carpets to sweep errors under that Claude seems to exhibit over and over instead of identifying and addressing root causes, I'm curious what the codebases of people who use it a lot look like.

throwaw12 3 hours ago | parent | prev [-]

yes I did, not on par with Opus 4.5.

I use Opus 4.5 for planning, when I reach my usage limits fallback to GLM 4.7 only for implementing the plan, it still struggles, even though I configure GLM 4.7 as both smaller model and heavier model in claude code

Onavo 2 hours ago | parent | prev | next [-]

Well DeepSeek V4 is rumored to be in that range and will be released in 3 weeks.

sampton 3 hours ago | parent | prev [-]

Every time Dario opens his mouth it's something weird.