| ▲ | mickeyp 2 hours ago | |
It doesn't help that academia loooves ColBERT and will happily tell you how amazing -- and, look, for how tiny the models are, 20M params and super fast on a CPU, it is -- they are at seemingly everything if only you... - Chunk properly; - Elide "obviously useless files" that give mixed signals; - Re-rank and rechunk the whole files for top scoring matches; - Throw in a little BM25 but with better stemming; - Carry around a list of preferred files and ideally also terms to help re-rank; And so on. Works great when you're an academic benchmaxing your toy Master's project. Try building a scalable vector search that runs on any codebase without knowing anything at all about it and get a decent signal out of it. Ha. | ||