Remix.run Logo
measurablefunc 2 days ago

Decompression is equivalent to executing code for a specialized virtual machine. It should be possible to automate this process of finding "small" programs that generate "large" outputs. Could even be an interesting AI benchmark.

shakna 2 days ago | parent | next [-]

Many of them already do this. [0]

It is a much easier problem to solve than you would expect. No need to drag in a data centre when heuristics can get you close enough.

[0] https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...

measurablefunc 2 days ago | parent [-]

I meant it should be possible to take a specialized virtual machine that is equivalent to decompressing some compressed bitstream & figure out how to write programs for it that are small but generate large outputs, not that it should be possible to do static analysis & figure out whether the given small program will generate a large output although that is also an interesting problem to solve & would also be an interesting AI benchmark.

bikeshaving 2 days ago | parent | prev [-]

My guess is this is a subset of the halting problem (does this program accept data with non-halting decompression), and is therefore beautifully undecidable. You are free to leave zip/tgz/whatever fork bombs as little mines for live-off-the-land advanced persistent threats in your filesystems.

machinationu 2 days ago | parent [-]

it's not. decompression always ends since it progresses through the stream always moving forward. but it might take a while