Remix.run Logo
janalsncm 2 months ago

> Most rerankers degrade substantially in quality over a few hundred candidates.

The reason we don’t use the most powerful models on thousands/millions of candidates is because of latency, not quality. It’s the same reason we use ANN search rather than cosine sim for every doc in the index.

nostrebored 2 months ago | parent | next [-]

This isn’t true. You can look at basically every cross encoder used today and observe degradations in precision with increases in k

Ofc latency matters for retrieval pipelines and this is another reason to care. But first pass retrieval has to surface the right candidates for it to matter at all. It has to do it within the constraints of the precision degradation wrt k of the first pass reranker

supo 2 months ago | parent | prev [-]

by that same logic, why would you not strive to push all the signals you have available into the ANN search? sure, some will have reduced resolution vs using a heavy reranker, but surely the optimal solution is to use the same signals in both stages and just add resolution in the second stage? the more they are aligned, the fewer candidates you need -> better latency & lower cost.