Remix.run Logo
bildung 8 hours ago

Fair enough, appreciate the detailed response! Can you elaborate why other quantizations weren't affected (e.g. bartowski)? Simply because they were straight Q4 etc. for every layer?

danielhanchen 7 hours ago | parent [-]

No Bartowski's are more affected - (38% NaN) than ours (22%) - for MiniMax 2.7 see https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax...

We already fixed ours. Bart hasn't yet but is still working on it following our findings.

blk.61.ffn_down_exps in Q4_K or Q5_K failed - it must be in Q6_K otherwise it overflows.

For the others, yes layers in some precision don't work. For eg Qwen3.5 ssm_out must be minimum Q4-Q6_K.

ssm_alpha and ssm_beta must be Q8_0 or higher.

Again Bart and others apply our findings - see https://www.reddit.com/r/LocalLLaMA/comments/1rgel19/new_qwe...

bildung 7 hours ago | parent [-]

Thanks again, TIL

danielhanchen 7 hours ago | parent [-]

Thanks!