Remix.run Logo
sanderjd 2 hours ago

Yeah that point about "random access is not the point of columnar formats" fell flat for me for this same reason. Almost since the first day I started using columnar data, I've been interested in solutions that strike this balance between batch and random access. This comes up all the time (in my experience) in data science / ML, where we have use cases for both access patterns against the same data.

So I'm with you, I'm very unconvinced that parquet (and the various things that are parquet or essentially-parquet under the hood) are the end of the line here.