| ▲ | gajjanag 11 hours ago | |
TurboQuant is known across the industry to not be state of the art. There are superior schemes for KV quant at every bitrate. Eg, SpectralQuant: https://github.com/Dynamis-Labs/spectralquant among many, many papers. > Given that TurboQuant results in a 6x reduction in memory usage for KV caches All depends on baseline. The "6x" is by stylistic comparison to a BF16 KV cache; not a state of the art 8 or 4 bit KV cache scheme. | ||