I don’t think the author understands the purpose of reranking.

During vector retrieval, we retrieve documents in sublinear time from a vector index. This allows us to reduce the number of documents from potentially billions to a much smaller number. The purpose of re-ranking is to allow high powered models to evaluate docs much more closely.

It is true that we can attempt to distill that reranking signal into a vector index. Most search engines already do this. But there is no replacement for using the high powered behavior based models in reranking.

▲

_QrE 2 months ago | parent | next [-]

I agree.

> "The real challenge in traditional vector search isn't just poor re-ranking; it's weak initial retrieval. If the first layer of results misses the right signals, no amount of re-sorting will fix it. That's where Superlinked changes the game."

Currently a lot of RAG pipelines use the BM25 algorithm for retrieval, which is very good. You then use an agent to rerank stuff only after you've got your top 5-25 results, which is not that slow or expensive, if you've done a good job with your chunking. Using metadata is also not really a 'new' approach (well, in LLM time at least) - it's more about what metadata you use and how you use them.

▲

nostrebored 2 months ago | parent [-]

If this were true, and initial candidate retrieval were a solved problem, teams where search is revenue aligned wouldn't have teams of very well paid people looking for marginal improvement here.

Treating BM25 as a silver bullet is just as strange as treating vector search as the "true way" to solve retrieval.

▲

_QrE 2 months ago | parent [-]

I don't mean to imply that it's a solved problem; all I'm saying is that in a lot of cases, the "weak initial retrieval" assertion stated by the article is not true. And if you can get a long way using what has now become the industry standard, there's not really a case to be made that BM25 is bad/unsuited, unless the improvement you gain from something more complex is more than just marginal.

	▲	supo 2 months ago \| parent [-]
		one thing to remember is that bm25 is purely in the domain of text - the moment any other signal enters in the picture (and it ~always does in sufficiently important systems), bm25 alone can literally have 0 recall.

▲

laszlo_cravens 2 months ago | parent | prev | next [-]

I agree as well. Especially in the context of recommendation systems, the decoupling of retrieval from a heavy ranker has a lot of benefits. It allows for 1) faster experimentation, and 2) the use of different retrieval sources. In reality, the retrieval might consist of a healthy mix of different algorithms (collaborative filtering, personalized page rank, word2vec/2tower embeddings, popular items near the user, etc.) and fallback heuristics

	▲	supo 2 months ago \| parent [-]
		It allows faster experimentation because you can't do things like partial embedding updates and reasonable schema migrations on your vector search index - if you could, you'd experiment in retrieval... and with better retrieval you don't have to move 100s or 1000s of candidates from a database and pay a ton for a ranker inference on every query (not even mentioning the latency impact of that).

▲

nostrebored 2 months ago | parent | prev | next [-]

That "much smaller number" is the tricky part. Most rerankers degrade substantially in quality over a few hundred candidates. No amount of powerful rerankers will make "high powered behavior based models" more effective. Those behavioral signals and intents have to be encoded in the query and the latent space.

▲

janalsncm 2 months ago | parent [-]

> Most rerankers degrade substantially in quality over a few hundred candidates.

The reason we don’t use the most powerful models on thousands/millions of candidates is because of latency, not quality. It’s the same reason we use ANN search rather than cosine sim for every doc in the index.

	▲	nostrebored 2 months ago \| parent \| next [-]
		This isn’t true. You can look at basically every cross encoder used today and observe degradations in precision with increases in k Ofc latency matters for retrieval pipelines and this is another reason to care. But first pass retrieval has to surface the right candidates for it to matter at all. It has to do it within the constraints of the precision degradation wrt k of the first pass reranker
	▲	supo 2 months ago \| parent \| prev [-]
		by that same logic, why would you not strive to push all the signals you have available into the ANN search? sure, some will have reduced resolution vs using a heavy reranker, but surely the optimal solution is to use the same signals in both stages and just add resolution in the second stage? the more they are aligned, the fewer candidates you need -> better latency & lower cost.

▲

supo 2 months ago | parent | prev [-]

If you could wave a magic wand and push all the ranking signals into retrieval and that index would be fast to update and not that expensive to operate - you would do that and you would delete all your reranking systems, wouldn't you?