But look, it almost looks as if the Static Huffman (a simpler encoding of compression with fewer decoding errors) almost bears out a certain aspect of the friend's intuition, in the following way:

* only 4.4% of the random data disassembles.

* only 4.0% of the random data decodes as Static Huffman.

BUT:

* 1.2% of the data decompresses and disassembles.

Relative to the 4.0% decompression, 1.2% is 30%.

In other words, 30% of successfully decompressed material also disassembles.

That's something that could benefit from an explanation.

Why is that, evidently, the conditional probability of a good disassemble, given a successful Static Huffman expansion, much higher than the probability of a disassemble from random data?

▲

Dylan16807 6 hours ago | parent [-]

There's an important number that's missing here, which is how many of the 128 bytes were consumed in that test.

With 40 million "success" and 570 "end of stream", I think that implies that out of a billion tests it read all 128 bytes less than a thousand times.

As a rough estimate off the static huffman tables, each symbol gives you about an 80% chance of outputting a byte, 18% chance of crashing, 1% chance of repeating some bytes, and 1% chance of ending decompression. As it gets longer the odds tilt a few percent more toward repeating instead of crashing. But on average it's going to use quite few of the 128 bytes of input, outputting them in a slightly shuffled way plus some repetitions.

	▲	kazinator 4 hours ago \| parent [-]
		How it should probably work is like a tokenizer. Recognize the longest prefix of the remaining input that can Huffman-decode. Remove that input, and repeat. Even that won't find the maximal amount of decoding that is possible; for that you have to slide through the input bit by bit and try decoding at every bit position. However, it seems fair because you wouldn't disassemble that way. If you disassemble some bytes successfully, you skip past those and keep going.