▲ | wanderingmind 6 days ago | ||||||||||||||||
Maybe I'm using it wrong, but when I try to use the full precision FP16 model, load it into chatter UI and ask a simple question, "write me a template to make a cold call to a potential lead", It throws me absolute rubbish. On the other hand, Qwen 0.6B Q8 quantized model nails the answer for the same question. Qwen 0.6B is smaller than gemma full precision. The execution is a tad slow but not by much. I'm not sure why I need to pick a gemma over qwen. | |||||||||||||||||
▲ | mdp2021 6 days ago | parent [-] | ||||||||||||||||
As many repeated here, it's (generally) not for direct use. It is meant to be a good base for fine-tuning and getting something very fast. (In theory, if you fine-tuned Gemma3:270M over "templating cold calls to leads" it would become better than Qwen and faster.) | |||||||||||||||||
|