Remix.run Logo
emschwartz 6 days ago

Most of the commercial and open source offerings for hybrid search seem to be using BM25 + vector similarity search based on embeddings. The results are combined using Reciprocal Rank Fusion (RRF).

The RRF paper is impressive in how incredibly simple it is (the paper is only 2 pages): https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf

softwaredoug 6 days ago | parent | next [-]

A warning that RRF is often not Enough, as it can just drag a good solution down towards the worse solution :)

https://softwaredoug.com/blog/2024/11/03/rrf-is-not-enough

emschwartz 6 days ago | parent [-]

Ah, that's great! Thanks for sharing that.

I had actually implemented full text search + vector search using RRF but I kept it disabled by default because it wasn't meaningfully improving my results. This seems like a good hypothesis as to why.

TeenGirlza17 6 days ago | parent | prev [-]

[flagged]