Remix.run Logo
juancn 2 days ago

Gemma4 is still power hungry since it tends to activate pretty much every weight.

qwen3-coder-next uses a lot less since it seems to only activate ~3B parameters at a time.

My guess is that this is still close to tech demo, and a lot of performance is left on the table.

fancyfredbot a day ago | parent [-]

The article is about two models which have either 2B or 4B parameters. Both are dense models. The 2B version will certainly use less power than qwen3-coder-next.

The models are quite good. They aren't just a tech demo.