▲ | ball_of_lint 15 hours ago | |||||||
This strategy or something like it legitimately wins the challenge. The challenge (based on expected value) is not to find some compression scheme that will compress 100% or even 50% of files to less than their original size. Instead it's to find any (set of) compression schemes that will compress more than 1/50 = 2% of arbitrary random files to less than their original size. You can construct such a program essentially as they described by asking for a large file size and then searching for a valid decompression program within that file. With standard tools you can make some valid decompression program quite short, which will allow you a reasonable probability that it will appear within a large randomly-generated file. The cost that you pay for using a compression scheme like this is that where the compression scheme works you'll win some bytes, but in the more common case where it doesn't you'll lose some, even for the simple `cat $1 > original` default decompressor. Now, a clever contest creator might arbitrarily choose files that win against such decompressors that they can think of, but Mike appears to have not have done so. | ||||||||
▲ | LegionMammal978 14 hours ago | parent | next [-] | |||||||
The problem is what you do outside the file: even if it contains a working decompressor, you have to actually run that decompressor, which most likely requires specifying the byte offset and doing some other work to extract it. And then the decompressor needs to shorten the file even past the length of that byte offset and extractor, which can't be done with reasonable probability. There are some formats which search for a valid header anywhere in the file, but then they'd get tripped up by all the mostly-valid headers that don't actually yield a decompressed file. | ||||||||
▲ | GuB-42 13 hours ago | parent | prev [-] | |||||||
For a truly random file, your 1/50 compression scheme that will make it smaller than the original will only make it smaller by 5 or 6 bits, maybe more if you are lucky, but it is an exponential, so realistically, you won't be able to save more than a byte or two, and you can't write that decompressor in two bytes. The only way to win is to reverse the random number generator used to create the contest file, or to exploit some bias if there is one. But if the file is generated properly, most likely with some CSPRNG, good luck going that route. The contest creator has to use a random file, any attempt to be clever will actually decrease entropy and may result in the contestant winning. | ||||||||
|