▲ | k__ 4 days ago | |||||||||||||||||||||||||||||||
"... the last 4 bytes of the gzip format. These bytes are special since store the uncompressed size of the file!" What's the reason for this? I could imagine, many tools could profit from knowing the decompressed file size in advance. | ||||||||||||||||||||||||||||||||
▲ | philipwhiuk 4 days ago | parent | next [-] | |||||||||||||||||||||||||||||||
It's straight from the GZIP spec if you assume there's a single GZIP "member": https://www.ietf.org/rfc/rfc1952.txt > ISIZE (Input SIZE) > This contains the size of the original (uncompressed) input data modulo 2^32. So there's two big caveats: 1. Your data is a single GIZP member (I guess this means everything in a folder) 2. Your data is < 2^32 bytes. | ||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||
▲ | lkbm 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||
I believe it's because you get to stream-compress efficiently, at the cost of stream-decompress efficiency. | ||||||||||||||||||||||||||||||||
▲ | 8cvor6j844qw_d6 4 days ago | parent | prev [-] | |||||||||||||||||||||||||||||||
gzip.py [1] --- def _read_eof(self): # We've read to the end of the file, so we have to rewind in order # to reread the 8 bytes containing the CRC and the file size. # We check the that the computed CRC and size of the # uncompressed data matches the stored values. Note that the size # stored is the true file size mod 2*32. --- |