Remix.run Logo
tkfoss 3 days ago

Wouldn't holy grail then be parallel channels for candidate generation;

  euclidean embedding
  hyperbolic embedding
  sparse BM25 / SPLADE lexical search
  optional multi-vector signatures

  ↓ merge & deduplicate candidates
followed by weight scoring, expansion (graph) & rerank (LLM)?
jdthedisciple 3 days ago | parent [-]

that is pretty much exactly what we do for our company-internal knowledge retrieval:

    embedding search (0.4)
    lexical/keyword search (0.4)
    fuzzy search (0.2)
might indeed achieve the best of all worlds