| ▲ | samwho 5 hours ago | |
Thank you! I was really surprised how robust models are to losing information. It seems wrong that they can be compressed so much and still function at all, never mind function quite closely to the original size. Think we're only going to keep seeing more progress in this area on the research side, too. | ||
| ▲ | buildbot 4 hours ago | parent [-] | |
You can even train in 4 & 8 bits with newer microscaled formats! From https://arxiv.org/pdf/2310.10537 to gpt-oss being trained (partially) natively in MXFP4 - https://huggingface.co/blog/RakshitAralimatti/learn-ai-with-... To Nemotron 3 Super, which had 25T of nvfp4 native pretraining! https://docs.nvidia.com/nemotron/0.1.0/nemotron/super3/pretr... | ||