Having tried it.

Qwen is really good.

Also, generally, it makes sense. 8B models are generally not very good^.

That this 8B model is decent is impressive, but that it could perform on par with a good model 4 times as large is a daydream.

^ - To be polite. The small models + tool use for coding agents are almost universally ass. Proof: my personal experience. Ive tried many of them.

▲

meatmanek 2 days ago | parent | next [-]

It's not that surprising that an 8B dense model would compete with a 35B-A3B MoE model.

The geometric mean rule of thumb for MoE models is that the intelligence level of an MoE model with T total parameters and A active parameters is roughly equivalent to that of a dense model with sqrt(A*T) parameters. For qwen3.6-35B-A3B, that equivalent size is 10.24B, spitting distance of an 8B model. Good training can make up the 28% difference in size.

▲

irishcoffee 2 days ago | parent | prev [-]

So it’s just like, your opinion, man?

edit: It was a play on The Big Lebowski, folks.

▲

Terretta 2 days ago | parent | next [-]

College SAT scores do not tell you how the dev applying for your open back end systems engineering job is going to do once they're in your workplace harness.

Nor do class standings, nor hackerrank and the like.

What will tell you is asking them to fix a thing in your codebase. Once you ask an LLM to do that, a dozen times, I'd argue it's no longer "just your opinion man", it's a context-engineered performance x applicability assessment.

And it is very predictive.

But it's also why someone doing well at job A isn't necessarily going to be great at B, or bad at A doesn't mean will necessarily be bad at B.

I've often felt we should normalize a sort of mutual try-buy period where job-change seeker and company can spend a series of days without harming one's existing employment, to derisk the mutual learning. ESPECIALLY to derisk the career change for the applicant who only gets one timeline to manage, opposed to company that considers the applicant fungible.

But back to the LLM, yeah, the only valid opinion on whether it works for you is not benchmark, it's an informed opinion from 'using it in anger'.

▲

noodletheworld 2 days ago | parent | prev | next [-]

> So it’s just like, your opinion, man?

Yes.

That is how you empirically evaluate tools; not by reading stupid benchmarks. By actually using the tools, for hours and hours. Doing real work.

Did you try using it? For hours? Do you use qwen?

How about you tell us about your experience with your great 8B models that you use daily. What coding agent harness do you have then hooked up to? What context size can you get before they lose track of whats happening? Do you swap between models for different coding tasks?

Or, have you not, actually, even actually tried any of this stuff, yourself?

	▲	irishcoffee 2 days ago \| parent [-]
		Work pays for copilot, so I use copilot. I will never spend a penny of my own money on this stuff. If it is free, I'll use it. I'll never use any free opensource anything from china ever, so fuck no I haven't used qwen.

▲

robotmaxtron 2 days ago | parent | prev [-]

the (dead) internet is full of opinions exactly like this

▲

brazukadev 2 days ago | parent [-]

you tried qwen3.6 and you think it is not good?

	▲	robotmaxtron 2 days ago \| parent [-]
		I do not have high opinions of any ai model.