Remix.run Logo
throwaway81523 6 hours ago

Gad, they sure like to say "BM25" over and over again. That's a near worthless approach to result ranking. Doing any halfway ok job requires much more tuned and/or more powerful approaches.

throwaway7783 6 hours ago | parent | next [-]

Can you please elaborate why?

cpursley 6 hours ago | parent | prev [-]

It's common to do a hybrid of BM25 with other fuzzy search or pgvector.

storus 5 hours ago | parent [-]

BM25 is quite bad and needs to be retrained for each corpus anew. SPLADEv2 is much better and there are even better sparse embeddings these days.