| ▲ | Der_Einzige 4 hours ago | |
This is true in general with LLMs, not just for code. LLMs can be told that their RAG tool is using BM25+N-grams, and will search accordingly. keyword search is superior to embeddings based search. The moment google switched to bert based embeddings for search everyone agreed it was going down hill. Most forms of early enshittification were simply switching off BM25 to embeddings based search. BM25/tf-idf and N grams have always been extremely difficult to beat baselines in information retrieval. This is why embeddings still have not led to a "ChatGPT" moment in information retrieval. | ||