Remix.run Logo
bluequbit 9 hours ago

I did not understand what polarQuant is.

Is is something like pattern based compression where the algorithm finds repeating patterns and creates an index of those common symbols or numbers?

Maxious 9 hours ago | parent | next [-]

https://mesuvash.github.io/blog/2026/turboquant-interactive/ has a little visualisation

pstoll 4 hours ago | parent | next [-]

Good post but link at the end is broken.

“”” For the full technical explanation with equations, proofs, and PyTorch pseudocode, see the companion post: TurboQuant: Near-Optimal Vector Quantization Without Looking at Your Data.“

spencerflem 9 hours ago | parent | prev [-]

I like the visualization, but I don’t understand the grid quantization. If every point is on the unit circle aren’t all the center grid cords unused?

fc417fc802 3 hours ago | parent | next [-]

Yeah that's odd. It seems like you'd want an n-1 dimensional grid on the surface of the unit sphere rather than an n dimensional grid within which the sphere resides.

Looking at the paper (https://arxiv.org/abs/2504.19874) they cite earlier work that does exactly that. They object that grid projection and binary search perform exceptionally poorly on the GPU.

I don't think they're using a regular grid as depicted on the linked page. Equation 4 from the paper is how they compute centroids for the MSE optimal quantizer.

Why specify MSE optimal you ask? Yeah so it turns out there's actually two quantization steps, a detail also omitted from the linked page. They apply QJL quantization to the residual of the grid quantized data.

My description is almost certainly missing key details; I'm not great at math and this is sufficiently dense to be a slog.

vincnetas 8 hours ago | parent | prev [-]

i think grid can be a surface of the unit sphere

mrugge 9 hours ago | parent | prev | next [-]

1. Efficient recursive transform of kv embeddings into polar coordinates 2. Quantize resulting angles without the need for explicit normalization. This saves memory via key insight: angles follow a distribution and have analytical form.

quotemstr 9 hours ago | parent [-]

Reminds me vaguely of Burrows-Wheeler transformations in bzip2.

viktorcode 7 hours ago | parent | prev [-]

The way I understand it, it's a way of compressing vectors by switching from their per-component representation to polar coordinates representation, where the nearby vectors are clumped together to a single line, allowing to describe them by different lengths