Remix.run Logo
nh43215rgb 6 days ago

270M is nice (and rare) addition. Is there a reason why this is not categorized as gemma3n model? I thought small models go under gemma3n category

rao-v 6 days ago | parent [-]

Not at Google (anymore), but Gemma3n is a radically different (and very cool) architecture. The MatFormer approach essentially lets you efficiently change how many parameters of the model you use while inferencing. The 2B model they released is just the sub model embedded in the original 4B model. You can also fiddle with the model and pull a 2.5 or 3B version pu also!

This is a more traditional LLM architecture (like the original Gemma 3 4B but smaller) and trained on an insane (for the size) number of tokens.

nh43215rgb 6 days ago | parent [-]

oh ok thank you. so something like MoE? That might not be so correct but at least the models need different architecture(MatFormer) to be classified under gemma3n.

canyon289 6 days ago | parent [-]

Its not an MOE, its what's referred to as a dense architecture, same as the Gemma3 models (But not 3n as noted)