▲ | danielmarkbruce 21 hours ago | |||||||||||||||||||||||||
"Nobody really cares if it meets a strict definition of lossless" != "quantization can be done haphazardly." | ||||||||||||||||||||||||||
▲ | BoorishBears 21 hours ago | parent [-] | |||||||||||||||||||||||||
If you're trying to really snarkily refer to the article on Dynamic Quants 2.0 and how carefully developed they were, they're comparing their quants to the methodology 99.99% quants out there use. The problem is not that people are making quants "haphazardly", it's that people keep parroting that various quants are "practically lossless" when they actually have absolutely no clue how lossy they are given how application specific the concept is for something as multidimensional as an LLM. The moment anyone tries a little harder to quantify how lossy they are, we repeatedly find that the answer is "not any reasonably definition of lossless". Even in their example where Q4 is <1% away in MMLU 5-shot is probably massively helped by a calibration dataset that maps to MMLU-style tasks really well, just like constantly using WikiText massively helps models that were trained on... tons of text from Wikipedia. So unless you're doing your own calibrated quantization with your own dataset (which is not impossible, but also not near common), even their "non-haphazard" method could have a noticeable impact on performance. | ||||||||||||||||||||||||||
|