| ▲ | bildung 8 hours ago | ||||||||||||||||
Fair enough, appreciate the detailed response! Can you elaborate why other quantizations weren't affected (e.g. bartowski)? Simply because they were straight Q4 etc. for every layer? | |||||||||||||||||
| ▲ | danielhanchen 7 hours ago | parent [-] | ||||||||||||||||
No Bartowski's are more affected - (38% NaN) than ours (22%) - for MiniMax 2.7 see https://www.reddit.com/r/LocalLLaMA/comments/1slk4di/minimax... We already fixed ours. Bart hasn't yet but is still working on it following our findings. blk.61.ffn_down_exps in Q4_K or Q5_K failed - it must be in Q6_K otherwise it overflows. For the others, yes layers in some precision don't work. For eg Qwen3.5 ssm_out must be minimum Q4-Q6_K. ssm_alpha and ssm_beta must be Q8_0 or higher. Again Bart and others apply our findings - see https://www.reddit.com/r/LocalLLaMA/comments/1rgel19/new_qwe... | |||||||||||||||||
| |||||||||||||||||