| ▲ | gpm 6 hours ago | |
Huh, cool. I guess that makes a lot of sense with all the success the quantization people have been having. So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong? | ||
| ▲ | rockinghigh 4 hours ago | parent | next [-] | |
The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16. | ||
| ▲ | liuliu 4 hours ago | parent | prev [-] | |
I32 are 8 4-bit value packed into one int32. | ||