Author here — questions and pushback both welcome.

You should benchmark the retrieval speed of each method in terms of queries per second. I suspect that the gain in bandwidth you get from slightly better compression will be defeated by decompression being much more expensive.

▲

afxuh 2 hours ago | parent | prev | next [-]

Cool idea. But it only works when the data never changes. could you make a streaming/incremental version? One that updates the math cheaply when new data arrives, instead of recomputing everything, or does the math fundamentally prevent it?

	▲	mpaiello 41 minutes ago \| parent [-]
		[dead]

▲

Devilstro 3 hours ago | parent | prev | next [-]

In the article, you mention this approach requires no search over hyper-parameter, because the method comprises a closed-form solution with "simple" linear algebra. I agree with this, but do you not in think need to tune the L2-regularization strength? That would for me be a hyper-parameter you would need to do a CV over (or similarly).

▲

stephantul an hour ago | parent | prev [-]

Really cool! I was investigating PCA on retrieval, thanks for the references.