▲ | atombender 4 days ago | |||||||||||||||||||||||||
Inverted indexes are what databases call indexes. It's used in the IR field to differentiate from forward indexes, which are less common, so you're right that we could just say "index's. But when we talk about inverted indexes, they are almost always term -> posting list, and most index data structures lay these out so that posting lists are sorted and compressed together. Traditional database indexes like B-trees are optimized for rapid insertion and deletion, while inverted indexes tend to be optimized for batch processing, because you typically deconstruct text into words for a large batch and then lazily integrate this batch into the main index. Part of this is about scale; a row in a database typically has a single column or maybe 2-3 columns in a composite index; but a document text may tokenize into thousands, hundreds of thousands, or millions of words. At this scale, the fine-grained nature of words mean B-trees aren't as a good a fit. Another part of it is that inverted indexes aren't for point queries, which is what B-trees are optimized for; you typically search for many words at a time in order to rank your search results by some function like cosine similarity. You rarely want a single posting; you want the union or intersection of many posting sorted by score. | ||||||||||||||||||||||||||
▲ | modulovalue 4 days ago | parent [-] | |||||||||||||||||||||||||
NIT: That's not quite correct if your first statement is meant to imply an equality rather than a subset relation. The idea of an index is more general, as an index can be built for many different domains. For example, B-trees can index monoidal data and inverted indexes are just an instance of such a monoid that a B-tree can efficiently index. Furthermore, metric spaces (e.g., levenshtein distance) can also be efficiently indexed using other trees: metric trees. So calling inverted indexes just indexes would be really confusing since string data is not the only kind of data that a database might want to support having efficient indexes for. | ||||||||||||||||||||||||||
|