Remix.run Logo
jychang 2 hours ago

Yeah, I saw that yesterday. The blog post does not explain why/how the Qwen 3.5 quants uploaded on 2/27 are different from the files uploaded on 2/24.

Old 2/24 Q4_K_XL commit (pre bugfix files): https://huggingface.co/unsloth/Qwen3.5-35B-A3B-GGUF/commit/7...

Questions for a postmortem that the blog post left unanswered:

- Why the change? Is it just to improve PPL/KLD? Sure, we can assume PPL and KLD are not perfect benchmarks. If yes, then why change the quantization anyways? Or was the old 2/24 quant actually much worse performing in the real world?I presume the Q4_K_XL quant using mxfp4 was the issue? If the 2/24 files having a lower PPL is an actual issue due to low quality tensors, then why not just say that?

- What were the main tensors that had the quantizations changed from 2/24 to 2/27? Did you now quantize attention tensors differently? Or perhaps ssm? T

- What was it changed from? Was it changed from mxfp4 or q4_k to q8, or something else?

A quick sentence in the blog post saying "ok, we've confirmed that using mxfp4 (or q3 or whatever) in the attention/ssm/biases/norms/etc is a bad idea, we had that in our old models on 2/24 and our new models today are better" that would make it clear. As it's written, it's trying to both say "PPL/KLD don't actually reflect real world quality" and "we changed our quant to increase PPL/KLD" at the same time, which seems contradictory.