Remix.run Logo
scrlk 6 hours ago

IME, unquantised -> FP8 is pretty much lossless. What matters more is having an unquantized KV cache - using an FP8 KV cache can result in a significant drop in quality.

johnnyApplePRNG 3 hours ago | parent | next [-]

>unquantised -> FP8 is pretty much lossless

Claude Shannon is rolling in his grave.

ComputerGuru 4 hours ago | parent | prev [-]

Do infra providers reveal that level of implementation detail?

scrlk 4 hours ago | parent [-]

I've seen a few articles from providers talking about KV cache quantisation, but it's not something they explicitly point out like they do with weights.

So you could end up paying more for unquantised weights, only to get silently hit with a quantised KV cache...