I am still very new to the open-weight/source models. If anyone is using them full-time, I’d really love to hear about the setup and how they perform, as I am considering moving my org off Anthropic products.

▲

marcyb5st 4 hours ago | parent | next [-]

Anecdotal, but here's my experience.

For personal stuff I use forgecode with openrouter. Firstly, forgecode is a much better harness than Cloude code (IMHO).

Anyway, regarding the models, my experience is that there is not much difference in terms of quality, but the cost difference is insane. At least for how I use agents. Yesterday's example is the following: I am developing a small DSL for search across complex technical documents. I wanted to add a small operator to it and thought that to give fable a spin. It burned through 13 USD and while it delivered the solution it wasn't objectively better than what Deepseek v4 did for 1.7 dollars (same exact task because I was curious).

For full disclosure, I ask agents for piecemeal stuff. Like in the DSL case, I designed the operators and then asked agents to implement them one by one. Probably if I asked to design the whole thing starting from these complex documents Fable would shine, but every time I try to give agents broader scope tasks they burn through millions of tokens, generate questionable code, which I have to spend time familiarize myself with.

	▲	sroerick 2 hours ago \| parent [-]
		I'm making DSLs a lot as an architecture pattern also. I'd be curious to know what stack you're using this and how you're approaching it

▲

DragonBooster 5 hours ago | parent | prev | next [-]

These models have open weights, but at the moment most flagship models are practically accessible only through third-party model providers. The main exception is models in the ~30B parameter range, which can still be run on consumer-grade GPUs. That said, even consumer GPUs have become increasingly expensive and difficult to justify in recent years.

	▲	mirekrusin 5 hours ago \| parent [-]
		You can definitely go above 30B on consumer hardware – 2x gpus, spark, mac, half byte quants etc.

▲

sdesol 4 hours ago | parent | prev | next [-]

I created this and I would say glm-4.7 accounts for 80% of the code in https://github.com/gitsense/gsc-cli

If you look at a file like:

https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...

you can see that I attribute the models used. What I found was 4.7 was not very good at `go` code which was why you started to see `Gemini 3 Flash` in the attributions.

4.7 is what Cerebras provide and for me, speed in iterations is a lot more important. Having played around with MiMo v2.5.0-Pro, I am 100% sure it could have done what Gemini 3 Flash did.

There were a few points where I was stuck and needed Sonnet to explain things to me, but I think the dirty secret that Anthropic and OpenAI won't tell you is, if you know how to code, the models are honestly good enough.

Based on my experience with MiMo and what others are saying about GLM 5.1, we are now in a hardware race. The Chinese Models are 100% drop in replacement for Claude if you know how to program but want to AI to help amplify what you know. What I will consider now is what provider can provide the fastest inference.

MiMo-v2.5.0-Pro-Ultraspeed is really good at generating good results quickly and burning your money as fast.

▲

kamranjon 4 hours ago | parent | prev | next [-]

I have been using deepseek v4 flash as my main model for everything ever since dwarf star came out. I run it on my M4 Max MacBook Pro with 128gb of memory. I run it usually as a server and connect to it over tailscale with my coding machine and use the Pi coding agent. It’s a big leap over using the Qwen models though it doesn’t have vision - so I still will run those when I use vision. GLM 4.7 flash was my previous go to for coding but I’ve completely switched to deepseek for all non-vision things.

▲

andai 5 hours ago | parent | prev | next [-]

I keep trying to switch to the Chinese models, but I keep finding myself asking Claude to fix their outputs. (Both functionality and style.) So I always end up switching back.[0]

I also keep trying GPT, which is quite solid. Very fast, great at debugging. But its code is often overly clever and hurts my brain.

(Maybe fixable with prompting. I tried and it helped the Chinese ones a bit. Just tell them do be elegant, like in the old image AI days "+good -bad"!)

For now I do still need my human brain to actually be able to make sense of the stuff, and Claude is the only one that consistently meets that requirement.

But I am hoping that one of these days, one of the Chinese labs figures out the special sauce :)

[0] (For smallish edits, though, I am having a great time with DeepSeek Flash. Practically unlimited AI on tap! How cool is that.)

▲

scottcha 5 hours ago | parent | prev | next [-]

I use glm5.1 plus pi with a few customized skills and am very happy with it. I hadn’t touched my Claude 5x plan for a couple of weeks but opened it back up in Claude code when fable was released and did a few tasks and still was happy to return to glm/pi.

	▲	sebastianconcpt 4 hours ago \| parent [-]
		Better than Qwen3.6-35B-A3B-8bit ? When I tried glm found it way way slower (omlx as runtime)

▲

trollbridge 4 hours ago | parent | prev | next [-]

Qwen 3.6 seems to be the strongest local models, works OK on an RTX 5090 or a > 32GB Mac.

▲

polski-g 3 hours ago | parent | prev [-]

I used glm5/5.1 for 60 days. Certainly better than Sonnet 4.6, not as good as Opus or GPT.

Use DCP or Magic Context plugin in OpenCode to keep the context below 160k and you're fine.