| ▲ | est 2 days ago |
| I wonder if there's any reverse zip-bombs? e.g. A realy big .zip file, takes long time to unzip, but get only few bytes of content. Like bomb the CPU time instead of memory. |
|
| ▲ | nwallin 2 days ago | parent | next [-] |
| Trivially. Zip file headers specify where the data is. All other bytes are ignored. That's how self extraction archives and installers work and are also valid zip files. The extractor part is just a regular executable that is a zip decompresser that decompresses itself. This is specific to zip files, not the deflate algorithm. |
| |
| ▲ | Retr0id 2 days ago | parent | next [-] | | There are also deflate-specific tricks you can use - just spam empty non-final blocks ad infinitum. import zlib
zlib.decompress(b"\x00\x00\x00\xff\xff" * 1000 + b"\x03\x00", wbits=-15)
If you want to spin more CPU, you'd probably want to define random huffman trees and then never use them. | | |
| ▲ | Retr0id 2 days ago | parent [-] | | I had claude implement the random-huffman-trees strategy and it works alright (~20MB/s decompression speed), but a minimal huffman tree that only encodes the end symbol works out even slower (~10MB/s), presumably because each tree is more compact. The minimal version boils down to: bytes.fromhex("04c001090000008020ffaf96") * 1000000 + b"\x03\x00"
|
| |
| ▲ | ks2048 2 days ago | parent | prev [-] | | That would be a big zip file, but would not take a long time to unzip. |
|
|
| ▲ | zipping1549 2 days ago | parent | prev [-] |
| Isn't that mathematically impossible? |
| |
| ▲ | hayley-patton 2 days ago | parent | next [-] | | I'm pretty sure it's mathematically guaranteed that you have to be bad at compressing something. You can't compress data to less than its entropy, so compressing totally random bytes (where entropy = size) would have a high probability of not compressing at all, if no identifiable patterns appear in the data by sheer coincidence. Establishing then that you have incompressible data, the least bad option would be to signal to the decompressor to reproduce the data verbatim, without any compression. The compressor would increase the size of the data by including that signal somehow. Therefore there is always some input for a compressor that causes it to produce a larger output, even by some miniscule amount. | |
| ▲ | hdjrudni 2 days ago | parent | prev [-] | | Why's that? I'm not really sure how DEFLATE works but I can imagine a crappy compression that's like "5 0" means "00000". So if you try to compress "0" you get "1 0" which is longer than the input. In fact, I bet this is true for any well-compressed format. Like zipping a JpegXL image will probably yield something larger. Much larger.. I don't know how you do that. |
|