Remix.run Logo
dataflow 6 days ago

Beginner(?) question: why is the model

  map<term_id, 
      list<pair<document_id, positions_idx>>
     > inverted_index;
and not

  map<term_id, 
      map<document_id, list<positions_idx>>
     > inverted_index;
(or using set<> in lieu of list<> as appropriate)?
marginalia_nu 6 days ago | parent [-]

This is to be seen as metaphorical to give a mental model for the actual data structures on disk so there's some tradeoff to finding the most accurate metaphor for what is happening.

I actually think you are right, list<pair<...>> is a bit of a weird choice that doesn't quite convey the data structures quite well. Map is better.

The most accurate thing would probably be something like map<term_id, map<document_id, pair<document_id, positions_idx>>>, but I corrected it to just a map<document_id, positions_idx> to avoid making things too confusing.

ch33zer 6 days ago | parent [-]

Currently it looks like this:

    map<term_id, 
      map<pair<document_id, positions_idx>>
      inverted_index;
list<positions> positions;

Think you also meant to remove the pair in map<pair>?

marginalia_nu 6 days ago | parent [-]

Haha, apparently very hard to get this right. Fixed again.