Remix.run Logo
_QrE a month ago

I don't mean to imply that it's a solved problem; all I'm saying is that in a lot of cases, the "weak initial retrieval" assertion stated by the article is not true. And if you can get a long way using what has now become the industry standard, there's not really a case to be made that BM25 is bad/unsuited, unless the improvement you gain from something more complex is more than just marginal.

supo a month ago | parent [-]

one thing to remember is that bm25 is purely in the domain of text - the moment any other signal enters in the picture (and it ~always does in sufficiently important systems), bm25 alone can literally have 0 recall.