Remix clone Hacker News

new | show | ask | jobs Github

	▲	veunes 6 hours ago
		Nah, those are completely different beasts. DeepSeek's MLA solves the KV cache issue via low-rank projection - they literally squeeze the matrix through a latent vector at train time. TurboQuant is just Post-Training Quantization where they mathematically compress existing weights and activations using polar coordinates
	▲	esafak 2 hours ago \| parent [-]
		No, it is about compressing the KV cache; see How TurboQuant works.