Maybe I'm using it wrong, but when I try to use the full precision FP16 model, load it into chatter UI and ask a simple question,

"write me a template to make a cold call to a potential lead",

It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question.

Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen.

▲

mdp2021 6 days ago | parent [-]

As many repeated here, it's (generally) not for direct use. It is meant to be a good base for fine-tuning and getting something very fast.

(In theory, if you fine-tuned Gemma3:270M over "templating cold calls to leads" it would become better than Qwen and faster.)

▲

wanderingmind 6 days ago | parent [-]

Why should we start fine tuning gemma when it is so bad. Why not instead focus the fine-tuning efforts on Qwen, when it starts off with much, much better outputs?

	▲	mdp2021 6 days ago \| parent [-]
		Speed critical applications, I suppose. Have you compared the speeds? (I did. I won't give you number (which I cannot remember precisely), but Gemma was much faster. So, it will depend on the application.)