TIL:
> Quantization-Aware Training (QAT) [...] allows preserving similar quality to bfloat16 while dramatically reducing the memory requirements to load the model