Remix.run Logo
liuliu an hour ago

Thanks. I think it is a good explanation, but also suggests a gap. QAT to me, if done right, is the only way to recover performance for extreme quantization regime. The only thing matters of course, if whether it can work. My confidence in QAT comes from the LoRA can recover most quality misses in quantization, and that is still different from QAT for extreme quantization, so it could be very wrong. I need to try it anyway.