That "much smaller number" is the tricky part. Most rerankers degrade substantially in quality over a few hundred candidates. No amount of powerful rerankers will make "high powered behavior based models" more effective. Those behavioral signals and intents have to be encoded in the query and the latent space.

▲

janalsncm 2 months ago | parent [-]

> Most rerankers degrade substantially in quality over a few hundred candidates.

The reason we don’t use the most powerful models on thousands/millions of candidates is because of latency, not quality. It’s the same reason we use ANN search rather than cosine sim for every doc in the index.

	▲	nostrebored 2 months ago \| parent \| next [-]
		This isn’t true. You can look at basically every cross encoder used today and observe degradations in precision with increases in k Ofc latency matters for retrieval pipelines and this is another reason to care. But first pass retrieval has to surface the right candidates for it to matter at all. It has to do it within the constraints of the precision degradation wrt k of the first pass reranker
	▲	supo 2 months ago \| parent \| prev [-]
		by that same logic, why would you not strive to push all the signals you have available into the ANN search? sure, some will have reduced resolution vs using a heavy reranker, but surely the optimal solution is to use the same signals in both stages and just add resolution in the second stage? the more they are aligned, the fewer candidates you need -> better latency & lower cost.