I agree.

> "The real challenge in traditional vector search isn't just poor re-ranking; it's weak initial retrieval. If the first layer of results misses the right signals, no amount of re-sorting will fix it. That's where Superlinked changes the game."

Currently a lot of RAG pipelines use the BM25 algorithm for retrieval, which is very good. You then use an agent to rerank stuff only after you've got your top 5-25 results, which is not that slow or expensive, if you've done a good job with your chunking. Using metadata is also not really a 'new' approach (well, in LLM time at least) - it's more about what metadata you use and how you use them.

▲

nostrebored 2 months ago | parent [-]

If this were true, and initial candidate retrieval were a solved problem, teams where search is revenue aligned wouldn't have teams of very well paid people looking for marginal improvement here.

Treating BM25 as a silver bullet is just as strange as treating vector search as the "true way" to solve retrieval.

▲

_QrE 2 months ago | parent [-]

I don't mean to imply that it's a solved problem; all I'm saying is that in a lot of cases, the "weak initial retrieval" assertion stated by the article is not true. And if you can get a long way using what has now become the industry standard, there's not really a case to be made that BM25 is bad/unsuited, unless the improvement you gain from something more complex is more than just marginal.

	▲	supo a month ago \| parent [-]
		one thing to remember is that bm25 is purely in the domain of text - the moment any other signal enters in the picture (and it ~always does in sufficiently important systems), bm25 alone can literally have 0 recall.