The terminal bench scores look weak but nice otherwise. I hope once the benchmarks are saturated, companies can focus on shrinking the models. Until then, let the games continue.

▲

anonzzzies 4 hours ago | parent | next [-]

Shrinking and speed; speed is a major thing. Claude Code is just too slow, very good but it has no reasonable way to handle simple requests because of the overhead, so then everything should just be faster. If I were Anthropic, I would've bought Groq or Cerebras by now. Not sure if they (or the other big ones) are working on similar inference hardware to provide 2000tok/s or more.

▲

theshrike79 7 hours ago | parent | prev | next [-]

z.ai models are crazy cheap. The one year lite plan is like 30€ (on sale though).

Complete no-brainer to get it as a backup with Crush. I've been using it for read-only analysis and implementing already planned tasks with pretty good results. It has a slight habit of expanding scope without being asked. Sometimes it's a good thing, sometimes it does useless work or messes things up a bit.

▲

maxdo 6 hours ago | parent | next [-]

I tried several times . It is no match in my personal experience with Claude models . There’s almost no place for second spot from my point of view . You are doing things for work each bug is hours of work, potentially lost customer etc . Why would you trust your money … just to back up ?

▲

theshrike79 4 hours ago | parent | next [-]

I'm using it for my own stuff and I'm definitely not dropping however much it costs for the Claude Max plans.

That's why I usually use Claude for planning, feed the issues to beads or a markdown file and then have Codex or Crush+GLM implement them.

For exploratory stuff I'm "pair-programming" with Claude.

At work we have all the toys, but I'm not putting my own code through them =)

	▲	maxdo 2 hours ago \| parent [-]
		it's beyond me, why do you need Max plans? I use Opus/Sonnet/Gemini,GPT 5.2 every day in cursor and I'm not paying Claude Max.

▲

ewoodrich 2 hours ago | parent | prev [-]

It's a perfectly serviceable fallback when Claude Code kicks me off in the middle of an edit on the Pro plan (which happens constantly to me now) and I just want to finish tweaking some CSS styles or whatever to wrap up. If you have a legitimate concern about losing customers than yes, you're probably in the wrong target market for a $3/mo plan...

▲

maxdo 2 hours ago | parent | next [-]

you can have a $20 usd /mo cursor with cutting edge models, and pay per use for extra use when you need per token, most of the time you will be ok within basic cursor plans, and you don't need to stick with one vendor. Today Claude is good , awesome ,tomorrow google is good - great.

I sometimes even ask several models to see what suggestion is best, or even mix two. Epcecially during bugfixes.

▲

skippyboxedhero an hour ago | parent | prev [-]

Openrouter with OpenCode.

	▲	ewoodrich 30 minutes ago \| parent [-]
		I've gone down that route already with Roo/Kilo Code and then OpenCode, but OpenCode with the z.ai backend and/or the CC z.ai Anthropic compatible endpoint although I've been moving to OC in general more and more over time. GLM 4.6 with Z.ai plan (haven't tried 4.7 yet) has worked well enough for straightforward changes with a relatively large quota (more generous than CC which only gets more frustrating on the Pro plan over time) and has predictable billing which is a big pro for me. I just got tired of having to police my OpenRouter usage to avoid burning through my credits. But yes, OpenCode is awesome particularly as it supports all the subscriptions I have access to via personal or work (Github Copilot/CC/z.ai). And as model churn/competition slows down over time I can stick which whichever end up having the best value/performance with sufficient quota for my personal projects without fear of lock-in and enshittification.

▲

sh3rl0ck 6 hours ago | parent | prev | next [-]

I shifted from Crush to Opencode this week because Crush doesn't seem to be evolving in its utility; having a plan mode, subagents etc seems to not be a thing they're working on at the mo.

I'd love to hear your insight though, because maybe I just configured things wrong haha

▲

allovertheworld 4 hours ago | parent | prev [-]

this doesn’t mean much if you hit daily limits quickly anyway. So the API pricing matters more

▲

CuriouslyC 8 hours ago | parent | prev | next [-]

We're not gonna see significant model shrinkage until the money tap dries up. Between now and then, we'll see new benchmarks/evals that push the holes in model capabilities in cycles as they saturate each new round.

▲

lanthissa 7 hours ago | parent [-]

isn't gemini 3 flash already model shrinkage that does well in coding?

▲

skippyboxedhero an hour ago | parent | next [-]

Xiaomi, Nvidia Nemotron, Minimax, lots of other smaller ones too. There are massive economic incentives to shrink models because they can be provided faster and at lower cost.

I think even with the money going in, there has to be some revenue supporting that development somewhere. And users are now looking at the cost. I have been using Anthropic Max for most of this year after checking out some of these other models, it is clearly overpriced (I would also say their moat of Claude Code has been breached). And Anthropic's API pricing is completely crazy when you use some of the paradigms that they suggest (agents/commands/etc) i.e. token usage is going up so efficient models are driving growth.

▲

hedgehog 7 hours ago | parent | prev | next [-]

Smaller open-weights models are also improving noticeably (like Qwen3 Coder 30B), the improvements are happening at all sizes.

▲

cmrdporcupine 7 hours ago | parent [-]

Devstral Small 24b looks promising as something I want to try fine tuning on DSLs, etc. and then embedding in tooling.

	▲	hedgehog 4 hours ago \| parent [-]
		I haven't tried it yet, but yes. Qwen3 Next 80B works decently in my testing, and fast. I had mixed results with the new Nemotron, but it and the new Qwen models are both very fast to run.

▲

Imustaskforhelp 6 hours ago | parent | prev [-]

How much billion parameter model is gemini 3 flash, I can't seem to find info about it online.

▲

bigyabai 7 hours ago | parent | prev [-]

It's a good model, for what it is. Z.ai's big business prop is that you can get Claude Code with their GLM models at much lower prices than what Anthropic charges. This model is going to be great for that agentic coding application.

▲

maxdo 6 hours ago | parent [-]

… and wake up every night because you saved a few dollars , there are bugs and they are due to this decision?

	▲	bigyabai 4 hours ago \| parent \| next [-]
		I pay for both Claude and Z.ai right now, and GLM-4.7 is more than capable for what I need. Opus 4.5 is nice but not worth the quota cost for most tasks.
	▲	Imustaskforhelp 6 hours ago \| parent \| prev [-]
		well I feel like all models are converging and maybe claude is good but only time will tell as gemini flash and GLM put pressure on claude/anthropic models People (here) are definitely comparing it to sonnet so if you take this stance of saving a few dollars, I am sure that you must be having the same opinion of using opus model and nobody should use sonnet too Personally I am interested in open source models because they would be something which would have genuine value and competition after the bubble bursts