▲ | n_u 4 days ago | |||||||||||||||||||||||||||||||||||||||||||
Which part of the index are you putting in the buffer pool here? The postings list, the doc store or the terms dict? Is it being cached for future queries or are you just talking about putting it in memory to perform the computation for a query? | ||||||||||||||||||||||||||||||||||||||||||||
▲ | marginalia_nu 4 days ago | parent [-] | |||||||||||||||||||||||||||||||||||||||||||
I'm primarily looking at document lists and possibly the keyword-documents mapping. Caching will likely be fairly tuned toward the operation itself, since it's not a general-purpose DBMS and I can fairly accurately predict which pages will likely be useful to cache or when read-ahead is likely to be fruitful based on the operation being performed. For keyword-document mappings some LRU cache scheme is likely a good fit, when reading a list of documents readahead is good (and I can inform the pool of how far to read ahead), when intersecting document lists I can also generally predict when pages are likely to be re-read or needed in the future based on the position in the tree. Will definitely need a fair bit of tuning but overall the problem is greatly simplified by revolving around very specific types of access patterns. | ||||||||||||||||||||||||||||||||||||||||||||
|