▲ | jankovicsandras 7 days ago | |||||||||||||||||||||||||||||||
Shameless plug: | ||||||||||||||||||||||||||||||||
▲ | softwaredoug 6 days ago | parent | next [-] | |||||||||||||||||||||||||||||||
If we're shameless plugging passion projects, SearchArray is a pandas extension for fulltext (BM25) search for dorking around with things in google colab https://github.com/softwaredoug/searcharray I'll also plug Xing Han Lu's BM25S which is very popular with similar goals: | ||||||||||||||||||||||||||||||||
▲ | mark_l_watson 6 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||
Thanks, yesterday I was thinking of adding BM25 to a little side project, so a well timed plug! Do you know of any pure Python wrapper projects for managing large numbers of text and PDF documents? I thought of using Solr or ElasticSearch but that seems too heavy weight for what I am doing. I am considering using SQLite with pysqlite3 and PyPDF2 since SQLite uses BM25. Sorry to be off topic, but I imagine many people are looking at tools for building hybrid BM25 / vector store / LLM applications. | ||||||||||||||||||||||||||||||||
|