| ▲ | nilirl 4 hours ago | |
Why is it implicit that semantic search will outperform lexical search? Back in 2023 when I compared semantic search to lexical search (tantivy; BM25), I found the search results to be marginally different. Even if semantic search has slightly more recall, does the problem of context warrant this multi-component, homebrew search engine approach? By what important measure does it outperform a lexical search engine? Is the engineering time worth it? | ||
| ▲ | kgeist an hour ago | parent | next [-] | |
It depends on how you test it. I recently found that the way devs test it differs radically from how users actually use it. When we first built our RAG, it showed promising results (around 90% recall on large knowledge bases). However, when the first actual users tried it, it could barely answer anything (closer to 30%). It turned out we relied on exact keywords too much when testing it: we knew the test knowledge base, so we formulated our questions in a way that helped the RAG find what we expected it to find. Real users don't know the exact terminology used in the articles. We had to rethink the whole thing. Lexical search is certainly not enough. Sure, you can run an agent on top of it, but that blows up latency - users aren't happy when they have to wait more than a couple of seconds. | ||
| ▲ | mips_avatar 3 hours ago | parent | prev | next [-] | |
Depends on how important keyword matching vs something more ambiguous is to your app. In Wanderfugl there’s a bunch of queries where semantic search can find an important chunk that lacks a high bm25 score. The good news is you can get all the benefits of bm25 and semantic with a hybrid ranking. The answer isn’t one or the other. | ||
| ▲ | andoando 3 hours ago | parent | prev [-] | |
The benefit I see is you can have queries like "conversations between two scientists". Its very dependent on use case imo | ||