| ▲ | noodletheworld 2 days ago | ||||||||||||||||||||||||||||||||||||||||||||||
Having tried it. Qwen is really good. Also, generally, it makes sense. 8B models are generally not very good^. That this 8B model is decent is impressive, but that it could perform on par with a good model 4 times as large is a daydream. ^ - To be polite. The small models + tool use for coding agents are almost universally ass. Proof: my personal experience. Ive tried many of them. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | meatmanek 2 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||
It's not that surprising that an 8B dense model would compete with a 35B-A3B MoE model. The geometric mean rule of thumb for MoE models is that the intelligence level of an MoE model with T total parameters and A active parameters is roughly equivalent to that of a dense model with sqrt(A*T) parameters. For qwen3.6-35B-A3B, that equivalent size is 10.24B, spitting distance of an 8B model. Good training can make up the 28% difference in size. | |||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | irishcoffee 2 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||
So it’s just like, your opinion, man? edit: It was a play on The Big Lebowski, folks. | |||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||