| ▲ | amitport 8 hours ago | ||||||||||||||||||||||||||||||||||
This is a great development for KV cache compression. I did notice a missing citation in the related works regarding the core mathematical mechanism, though. The foundational technique of applying a geometric rotation prior to extreme quantization, specifically for managing the high-dimensional geometry and enabling proper bias correction, was introduced in our NeurIPS 2021 paper, "DRIVE" (https://proceedings.neurips.cc/paper/2021/hash/0397758f8990c...). We used this exact rotational approach and a similar bias correction mechanism to achieve optimal distributed mean estimation. I also presented this work and subsequent papers in a private invited talk at Google shortly after publication. Given the strong theoretical overlap with the mechanisms in TurboQuant and PolarQuant, I hope to see this prior art acknowledged in the upcoming camera-ready versions. | |||||||||||||||||||||||||||||||||||
| ▲ | eecc 4 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
Pardon my simplistic question, but when you mean rotation you’re essentially talking about diagonalization aren’t you? So storing the diagonal as a matrix and the new bases is more compact? | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | jmalicki 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
If they didn't cite your paper that's bullshit. But if they read your paper enough that they invited you to a talk, that probably means they were far enough along to independently inventing it they were going to do so anyway, and wanted to chat with someone who was also doing the thing they were already doing. Good ideas tend to reveal themselves to anyone who is aware of the problem. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | busfahrer 5 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
I just today learned about Multi-Head Latent Attention, which is also sort of a way of compressing the KV cache. Can someone explain how this new development relates to MHLA? | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | sva_ 3 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Schmidhuber'd | |||||||||||||||||||||||||||||||||||