Remix.run Logo
kadushka 3 hours ago

Most quant papers I've seen usually report non-trivial degradation on standard benchmarks, like 1-10% degradation (compared to FP16/BF16). Especially when using 4 bits or lower. For example, I just opened a random paper: https://arxiv.org/pdf/2410.09426 see Table 1.

p.s. dense vs MoE: both are being released because they offer different trade-offs: at the same level of quality, MoE will use less compute, but more memory.