About 90% of my coding is on Qwen 3.6 27b and Open Code with some custom skills and Semble. It is NOT as smart as CC or Codex but its enough to get most of my work done. I didn't set out to replace CC and Codex (I have an RTX 6000 so the TPS is faster than I care about, but the RTX 6000 was originally for other work). I only tried this just to see how close you could get to a frontier model for coding as an experiment, but it was good enough that I stuck with it. I still fall back to Codex for really complicated stuff and to polish UI's as that seems to be the weakest element to working in Qwen.This isn't a recommendation because I don't think most people have an RTX 6000 laying around and the cost would be many years of MAX CC or Codex subscriptions, but at least this seems possible. Maybe in a few more years it will even be practical.

Other Notes: I have had to set the compact target to 75% on a 256k context window as once the conversation length goes about 100k I start seeing a drop in the quality and speed. This becomes very problematic after about 150k. I tried Qwen 3.5 122b too but it actually seems much worse at coding than 3.6 27b even though its much larger. Maybe because I am using a 4bit quant or maybe I just don't have it configured correctly? I know 3.6 is newer but I didn't expect it to out perform a model that is much larger from the prior generation. Gemma 4 31b is a good model for other tasks but at least my personal experience is that Qwen outperforms in coding. Nemotron Super 120b is great at a lot of stuff but it also seems to be not as good at coding as Qwen. This was very surprising to me.

▲ heipei 4 hours ago | parent | next [-]

Same here, I use Qwen 3.6 27b (Q6 quant) with llama.cpp on an RTX 5090 using the pi agent exclusively now. The fact that it's local means that I never have to think about token pricing, quotas, time of day, or data sensitivity. I have limited the GPU from 600W to 450W which means the system stays whisper quiet during inference.

I have become so "lazy" (in a good way), so far that I've started using the model for lots of daily mundane things on top of just coding:

  * "commit this on a branch, push, create a PR and assign $nickname for review"
  * "Use the Stripe CLI to download all open and overdue invoices and reconcile them with this CSV export from our bank account."
  * "Use these Elasticsearch credentials to summarise what kind of operations are causing load at the moment."
  * "Tell me if our codebase already supports X and where it's  implemented."

	▲	amarshall 36 minutes ago \| parent [-]
		What context length and kv cache quant (if any) are you using? And MTP?

▲ bo1024 4 hours ago | parent | prev | next [-]

Qwen3.5-122B is actually Qwen3.5-122B-A10B. The A10B means that this is a "mixture of experts" model where only 10B parameters are activated at a given time. Whereas Qwen3.6-27B is a "dense" model where all 27B parameters are activated all the time. So for many tasks, you'd expect the 27B dense model to be better than the 122B-A10B model.

▲ user43928 2 hours ago | parent | prev | next [-]

I am forced to use Qwen 3.6 27b at work and found it next to useless. I might as well do all the work manually rather than having it implement another mess or get the debugging entirely wrong.

It feels like anything less than Sonnet is just a waste of time, apart from use as a smarter search function.

It also strikes me as strange that you would mention Codex for UI polish, as it's notoriously bad at UI, and far behind Claude Opus. Altman specifically posted that they are working to improve this for the next model release.

▲

sejje an hour ago | parent [-]

It might be good at analysis & review, writing documentation, git commits, etc--even if it's not good at coding.

All the drudgery.

	▲	user43928 39 minutes ago \| parent [-]
		Bad AI written documentation and commits are not great, particularly when you work in a team. I almost find it offensive when colleagues open a MR with an obvious slop description that's frequently inaccurate. That said, I find AI useful for a lot of drudgery like resolving merge conflicts or splitting changes out into separate MRs. Particularly with the latter I had issues with small models, they butchered the changes I wanted moved. Not even on the second attempt did GPT 5.4 mini manage to move 10-20 lines to another file without modifying them in the process.

▲ htrp 4 hours ago | parent | prev [-]

why 27b vs 35b? Is MoE that much worse for coding?

	▲	amarshall 30 minutes ago \| parent \| next [-]
		Can take the geometric mean of total and active parameters of MoE to get approximate equivalent quality to dense model params. So sqrt(35*10)≈18.7. The trade-off of MoE is that it is worse but faster for the same total size.
	▲	electronsoup 3 hours ago \| parent \| prev [-]
		Yeah MoE is a little worse for the same size, but you can often run bigger MoEs at respectable speeds even on cpu ram offload. The dense models really need to be 100% vram