| ▲ | juancn 2 days ago | |
Gemma4 is still power hungry since it tends to activate pretty much every weight. qwen3-coder-next uses a lot less since it seems to only activate ~3B parameters at a time. My guess is that this is still close to tech demo, and a lot of performance is left on the table. | ||
| ▲ | fancyfredbot a day ago | parent [-] | |
The article is about two models which have either 2B or 4B parameters. Both are dense models. The 2B version will certainly use less power than qwen3-coder-next. The models are quite good. They aren't just a tech demo. | ||