▲ | andreinwald 2 days ago | |||||||
It works on small Llama-3.2-1B model, specially for less powerfull GPU devices | ||||||||
▲ | wongarsu a day ago | parent [-] | |||||||
The answer is still terrible for the model size. Maybe it's the 4 bit quantization, smaller models tend to react worse to that For reference, [1] is what stock quen3-0.6B would answer. Not a perfect answer, but much better at nearly half the number of parameters 1: https://markdownpastebin.com/?id=7ad4ad9f325d4354a858480abdc... | ||||||||
|