▲ | mgraczyk 6 days ago | |
Maybe a dumb question, but how do you know how many frames to seek past? For example say you want to seek to 10MB into the uncompressed file. Do you need to store metadata separately to know how many frames to skip? | ||
▲ | teraflop 6 days ago | parent | next [-] | |
A seekable Zstd file contains a seek table, which contains the compressed and uncompressed size of all frames. That's enough information to figure out which frame contains your desired offset, and how far into that frame's decompressed data it occurs. | ||
▲ | rwmj 6 days ago | parent | prev [-] | |
Not sure about zstd, but in xz the blocks (frames in zstd) are stored across the file and linked by offsets into a linked list, so you can just scan over the compressed file very quickly at the start, and in memory build a map of uncompressed virtual offsets to compressed file positions. Here's the code in nbdkit-xz-filter: https://gitlab.com/nbdkit/nbdkit/-/blob/master/filters/xz/xz... |