This is beautifully written and visualised, well done! The KL divergence comparisons between original and different quantisation levels is on-point. I'm not sure people realize how powerful quantisation methods are and what they've done for democratising local AI. And there are some great players out there like Unsloth and Pruna.

▲

samwho 5 hours ago | parent [-]

Thank you! I was really surprised how robust models are to losing information. It seems wrong that they can be compressed so much and still function at all, never mind function quite closely to the original size.

Think we're only going to keep seeing more progress in this area on the research side, too.

	▲	buildbot 4 hours ago \| parent [-]
		You can even train in 4 & 8 bits with newer microscaled formats! From https://arxiv.org/pdf/2310.10537 to gpt-oss being trained (partially) natively in MXFP4 - https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-... To Nemotron 3 Super, which had 25T of nvfp4 native pretraining! https://docs.nvidia.com/nemotron/0.1.0/nemotron/super3/pretr...