▲ | BoorishBears a day ago | ||||||||||||||||||||||||||||||||||
I do? I spend a ton of time post-training models for creative tasks. The effects of model quantization are usually qualified in terms of performance on benchmaxxed tasks with strong logit probabilities, temp 0, and a "right" answer the model has to pick. Or even worse they'll be measured on metrics that don't map to anything except themselves like perplexity (https://arxiv.org/pdf/2407.09141) I agree Q8 is strong but I also think the effects of quantization are constantly being underappreciated. People are often talking about how these models perform while fundamentally using 10+ variants of a single model with distinct performance profiles. Even knowing the bits per weight used isn't enough to know how exactly a given quant method is affecting the model: https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-ggufs | |||||||||||||||||||||||||||||||||||
▲ | imtringued 8 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
If you've trained your own models you would be aware of quantization aware training. | |||||||||||||||||||||||||||||||||||
▲ | danielmarkbruce 21 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
"Nobody really cares if it meets a strict definition of lossless" != "quantization can be done haphazardly." | |||||||||||||||||||||||||||||||||||
|