| ▲ | pauldix 2 days ago | |
I believe you could do this effectively with COBS (COmpact Bit Sliced signature index): https://panthema.net/2019/1008-COBS-A-Compact-Bit-Sliced-Sig... It's a pretty neat algorithm from a paper in 2019 for the application "to index k-mers of DNA samples or q-grams from text documents". You can take a collection of bloom filters built for documents and then combine them together to have a single filter that will tell you which docs it maps to. Like an inverted index meets a bloom filter. I'm using it in a totally different domain for an upcoming release in InfluxDB (time series database). There's also code online here: https://github.com/bingmann/cobs | ||