| ▲ | phiresky 2 hours ago | |||||||
I think you're looking at a different perspective than me. At _build time_ you need to process O(n), yes, and generate O(n) additional data. But I said "The amount of data you need to decompress is a constant". At _read time_, you need to do exactly three steps: 1. Load the file index - this one scales with the number of files unless you do something else smart and get it down to O(log(n)). This gives you an offset into the file. *That same offset/32 is an offset into your gzip index.* 2. take that offset, load 32kB into memory (constant - does not change by number of files, total size, or anything else apart from the actual file you are looking at) 3. decompress a 1MB chunk (or more if necessary) So yes, it's a constant. | ||||||||
| ▲ | vlovich123 2 hours ago | parent [-] | |||||||
My bad. Yes from decompression perspective you have O(1) ancillary data to initiate decompression at 1 seek point. This is how seeking can work in encrypted data btw without the ancillary data - you just increment the IV every N bytes so there’s a guaranteed mapping for how to derive the IV for a block so you’re bounded by how much extra you need to encrypt/decrypt to do a random byte range access of the plaintext. But none of this is unique to gzip. You can always do this for any compression algorithm provided you can snapshot the state - the state at a seek point is always going to be fairly small. | ||||||||
| ||||||||