In the theoretical section, they extrapolate assuming a polynomial from 40 to thousands of dimensions. Why do they trust a polynomial fit to extrapolate two orders of magnitude? Why do we even think it's polynomial instead of exponential in the first place? Most things like this increase exponentially with dimension.

In fact, I think we can do it in d=2k dimensions, if we're willing to have arbitrarily precise query vectors.

Embed our points as (sin(theta), cos(theta), sin(2 x theta), cos(2 x theta)..., sin(k x theta), cos(k x theta)), with theta uniformly spaced around the circle, and we should be able to select any k of them.

Using a few more dimensions we can then ease the precision requirements on the query.

▲

namibj 3 days ago | parent | next [-]

In practice you're actually hitting further problems because you don't have those synthetic top-k tasks but rather open-domain documents and queries to support. And if you hope to get better than "just" having the top-k correct and instead get some sort of inclusion/exclusion boundary between what should be matched and what should not be matched, you'll hit the same bounds as apply to context length limitations for kq dimenionality in a transformer's attention layers, as I mentioned about 6 weeks ago: https://news.ycombinator.com/item?id=44570650

▲

yorwba 3 days ago | parent | prev [-]

I'm not following your construction. In the k=2 case, how do you construct your 4-dimensional query vector so that the dot product is maximized for two different angles theta and phi, but lower for any other arbitrary angle?

	▲	Straw 3 days ago \| parent [-]
		Viewing our space as two complex points instead of four real: Let H(z) = (z- exp(i theta)) (z - exp(i phi)) H(z) is zero only for our two points. We'll now take the norm^2, to get something that's minimized on only our two chosen points. \|H(exp(i x))\|^2 = H(z) x conj(H(z)) = sum from -2 to 2 of c_j x exp(i j x) For some c_j (just multiply it out). Note that this is just a Fourier series with highest harmonic 2 (or k in general), so in fact our c_j tell us the four coefficients to use. For a simpler, but numerically nastier construction, instead use (t, t^2, t^3, ...), the real moment curve. Then given k points, take the degree k poly with those zeros. Square it, negate, and we get a degree 2k polynomial that's maximized only on our selected k points. The coefficients of this poly are the coefficients of our query vector, as once expanded onto the moment curve we can evaluate polynomials by dot product with the coefficients. The complex version is exactly the same idea but with z^k and z ranging on the unit circle instead.