Remix.run Logo
xlii a day ago

I've been checking out GLM 5.2 on some projects and few thoughts on it:

- it takes it sweet time to get code rolling, not the fastest model by any means

- it strays a lot during discovery/planning but then corrects

- it's not steering friendly, as it hallucinates things that it doesn't follow later on

- its output is quite good

A sample use case: I was optimizing rendering on Swift+Zig codebase. It chocked on 5k data entries.

GLM 5.2 spent 20 minutes building the benchmarks and getting data out, which made me frustrated so I blocked non-editing tool access and went AFK, after approx. 30 minutes I found that it used already-made benchmarks and some "conclusions" to optimize 3 choke points. Output pointed that it couldn't validate suspicions and asked for more data.

Implementation worked well, it was idiomatic and non-intrusive. I would even say that it was more idiomatic than GPT 5.5 effects on same repo.

I would opt in in using it more BUT GPT usually completes same requests 5x faster.

GLM 5.2 was spark for preparing and running inside isolated containers with JJ workspaces (so that multiple can be ran in parallel).

jeremyjh a day ago | parent | next [-]

Its also nice that you can see its entire reasoning trace. I can see it going off the rails - or see something I forgot to tell it - and stop and correct it. Or I'll learn WHY it made the choice it did and not have to question it after.

jauntywundrkind a day ago | parent [-]

Strong agree! I deeply appreciate this aspect of GLM. Watching it think & being able to nudge early is incredibly useful. Being able to point at bad assumptions is incredibly useful. Watching what it's seeing is super informative.

It's always a shock to me how opaque most other models are!

It also is pretty resilience to letting you inject in while it's working without going off course or while getting back on track after, which I appreciate

Sanzig a day ago | parent [-]

> It's always a shock to me how opaque most other models are!

This is (unfortunately) by design. The proprietary models hide their reasoning traces so they can't be used for model distillation. Sometimes even when they do show reasoning, it isn't the model's real trace - IIRC, someone was able to demonstrate that Opus' reasoning is usually a summary made with Haiku behind the scenes.

braebo a day ago | parent [-]

It is such a momentum killer being forced to stare at a silly word for 4 minutes instead of being able to read the thinking as it streams in. I can’t wait until I can drop Anthropic at work. Their UX sucks, intentionally, for anti competitive reasons like “don’t distill our model we trained on all the data & IP we stole and processed with the mass exploitation of data workers in the global south!”.

trollbridge a day ago | parent | prev | next [-]

I used it the other day for something of low importance that other models simply weren't figuring out and I didn't want to burn up Opus 4.8 on. (It had to do with overriding left-click on a macOS menu bar and then making Ctrl+click or right click bring up the menu like left-click normally does, and doing all this conditionally.)

Switched the model to GLM-5.2 halfway in the middle of a troubleshooting session (didn't even bother to reprompt, just changed it in the middle of its reasoning), gave it a few minutes, problem fixed. This is with the subscription based allocation on OpenCode Go, where a problem like this would completely burn up my Opus for the current 5 hours or even the current week.

nijave a day ago | parent | prev | next [-]

>it takes it sweet time to get code rolling, not the fastest model by any means

Which provider are you using? I got a z.ai Lite Coding Plan and it's my understanding z.ai is on the slower side of providers and the Lite plan gets lower priority on top of that. In the api key console, it shows dipping below 60 tok/sec which is quite slow.

xlii 21 hours ago | parent [-]

I have Max access from a friend. It's not about token generation but time-to-first-edit. It tends to think 3-10 minutes before that.

Oras a day ago | parent | prev | next [-]

Also pricing, I wanted to give a try, but when pricing is only 30% cheaper than Opus, I wouldn't go for it with these issues.

nijave a day ago | parent | next [-]

z.ai coding plan is a fairly decent deal at ~$16/mon USD considering it's supposed to have a fair bit more usage than the comparable $20/mon Claude plan. On the other hand, z.ai seems a bit on the slower side for raw model tok/sec throughput.

chpatrick a day ago | parent | prev | next [-]

It's pricing is a lot cheaper if you can run it yourself.

nijave a day ago | parent [-]

Not this one. It's a SOTA-class model >800Gi VRAM required at fp8

jeremyjh a day ago | parent | prev [-]

What?

It is less than 20% of the cost of Opus at API rates. 1.40/4.40 vs 5/25.

cmrdporcupine a day ago | parent [-]

Not when you factory in token efficiency. It burns a lot more tokens to do the same job, so when I compared to GPT5.5 I was frankly not really much ahead, and with weaker thinking.

Maybe makes sense if you have z.AI's (not greatly priced) subscription plan, but it's not competitive against an OpenAI or Anthropic monthly coding subscription plan. I burned through almost $10 worth of tokens just doing an hour of work.

Sanzig a day ago | parent [-]

Take a look at Ollama Cloud: https://ollama.com/pricing

You get access to a whole bunch of bleeding edge open models including GLM-5.2, Kimi K2.7, DeepSeek 4 Pro, etc. Inference is run on US/SG/EU cloud providers with zero data retention policies. The $20/mo tier is very generous, in my experience.

jeremyjh a day ago | parent | next [-]

They don’t have a statement about where it is run or data retention on the GLM5.2 model. They do state that for others, like MiniMax.

Sanzig a day ago | parent [-]

There's a blanket statement at the bottom of the pricing page, which I would hope also applies to GLM-5.2:

> Where are models hosted?

> Ollama hosts models and compute resources primarily in the United States. To serve global demand, we may route to Europe and Singapore for additional capacity.

> Is my prompt or response data trained on?

> Prompt or response data is never logged or trained on.

> Who does Ollama partner with to host models?

> Ollama collaborates with NVIDIA Cloud Providers (NCPs) to host open models.

> When Ollama partners with providers, we require no logging, no training, and zero data retention policies in place.

cmrdporcupine 19 hours ago | parent | prev [-]

Well I tried the $20/mo tier and used GLM specifically and did maybe 3-4 hours of work and I'm already through 50% of my monthly tier and blew through my time limited quota twice. I won't renew for another month.

Which I think only underscores my point that actually the GLM models are not very cost effective.

They essentially cost the same as the SOTA models from OpenAI and Anthropic, while not being quite as smart. I could have gotten about the same amount of work done on the $20 Codex plan. And I had to use my $100 Codex plan to finish the work GLM started before it ran out of quota. And also to fix it since GLM left a bit of a mess.

I like that GLM exists. Other Chinese models are far more cost effective. GLM is expensive, even on a fixed plan.

jeremyjh 3 hours ago | parent [-]

Ollama can’t meaningfully subsidize their subscriptions - there is no business case to do so because they are a commodity host. If you want to compare subsidized subscription value you would need to compare with z.ai’s plans. One problem with any comparison is that they are all very opaque in terms of usage and the plans change a lot over time. I got on pro at $30 a month so it’s a very good value - compared to $20 Claude/Codex plans I get at least 10x the usage and I use all 3 regularly. At today’s prices Codex pro ($100) is likely a better value.

But if you are building a product or in an enterprise environment where you essentially have to pay API rates then GLM is the best value hands down.

Imanari a day ago | parent | prev [-]

This mirrors my experience. I have been using it in Pi. It is smart and output is good but it is not efficient in getting there.

ju-st a day ago | parent [-]

which thinking level? max or high?