| ▲ | kazinator 6 hours ago | |||||||
But look, it almost looks as if the Static Huffman (a simpler encoding of compression with fewer decoding errors) almost bears out a certain aspect of the friend's intuition, in the following way: * only 4.4% of the random data disassembles. * only 4.0% of the random data decodes as Static Huffman. BUT: * 1.2% of the data decompresses and disassembles. Relative to the 4.0% decompression, 1.2% is 30%. In other words, 30% of successfully decompressed material also disassembles. That's something that could benefit from an explanation. Why is that, evidently, the conditional probability of a good disassemble, given a successful Static Huffman expansion, much higher than the probability of a disassemble from random data? | ||||||||
| ▲ | Dylan16807 6 hours ago | parent [-] | |||||||
There's an important number that's missing here, which is how many of the 128 bytes were consumed in that test. With 40 million "success" and 570 "end of stream", I think that implies that out of a billion tests it read all 128 bytes less than a thousand times. As a rough estimate off the static huffman tables, each symbol gives you about an 80% chance of outputting a byte, 18% chance of crashing, 1% chance of repeating some bytes, and 1% chance of ending decompression. As it gets longer the odds tilt a few percent more toward repeating instead of crashing. But on average it's going to use quite few of the 128 bytes of input, outputting them in a slightly shuffled way plus some repetitions. | ||||||||
| ||||||||