Remix.run Logo
gajjanag 11 hours ago

TurboQuant is known across the industry to not be state of the art. There are superior schemes for KV quant at every bitrate. Eg, SpectralQuant: https://github.com/Dynamis-Labs/spectralquant among many, many papers.

> Given that TurboQuant results in a 6x reduction in memory usage for KV caches

All depends on baseline. The "6x" is by stylistic comparison to a BF16 KV cache; not a state of the art 8 or 4 bit KV cache scheme.