Remix.run Logo
mbreese 6 days ago

Oh, I’m pretty sure you could set a gzip header field with a full index and a zero-byte payload. You could even make it so that the size of that last block would be in a standard location in the file (at a known offset, still in the gzip header).

One issue with bgzip in particular is that it fixes the gzip header fields allowed, so you can only have one extra value (which is the size of the current block). Because of this, you can’t have new fields in the header for bgzip (the gzip flavor widely used in bioinformatics). One thing I wanted to do was to also add was a header field for sha1/sha256/etc for the current block. When you have files of sufficient size, it can be helpful to have chunk-level signatures to protect against bitrot. This is just one usecase for novel header elements (which is somewhat alleviated as gzip blocks all have their own crc32, but that’s just one idea).