| ▲ | lucrbvi 7 hours ago | |||||||
Sounds like Multi-Head Latent Attention (MLA) from DeepSeek | ||||||||
| ▲ | veunes 6 hours ago | parent [-] | |||||||
Nah, those are completely different beasts. DeepSeek's MLA solves the KV cache issue via low-rank projection - they literally squeeze the matrix through a latent vector at train time. TurboQuant is just Post-Training Quantization where they mathematically compress existing weights and activations using polar coordinates | ||||||||
| ||||||||