| ▲ | yinksta 2 hours ago | |
the industry has largely moved away from QAT because the hardware required for running a quantized model are an order of magnitude less than training/QATing the fp model. That's why things like Autoround, GPTQ, AWQ have been so popular, you don't even need enough hardware to run the original model on gpu, just cpu is enough due to the data efficiency | ||
| ▲ | liuliu 6 minutes ago | parent [-] | |
Thanks. I think it is a good explanation, but also suggests a gap. QAT to me, if done right, is the only way to recover performance for extreme quantization regime. The only thing matters of course, if whether it can work. My confidence in QAT comes from the LoRA can recover most quality misses in quantization, and that is still different from QAT for extreme quantization, so it could be very wrong. I need to try it anyway. | ||