Remix.run Logo
Dylan16807 7 hours ago

There's an important number that's missing here, which is how many of the 128 bytes were consumed in that test.

With 40 million "success" and 570 "end of stream", I think that implies that out of a billion tests it read all 128 bytes less than a thousand times.

As a rough estimate off the static huffman tables, each symbol gives you about an 80% chance of outputting a byte, 18% chance of crashing, 1% chance of repeating some bytes, and 1% chance of ending decompression. As it gets longer the odds tilt a few percent more toward repeating instead of crashing. But on average it's going to use quite few of the 128 bytes of input, outputting them in a slightly shuffled way plus some repetitions.

kazinator 5 hours ago | parent [-]

How it should probably work is like a tokenizer. Recognize the longest prefix of the remaining input that can Huffman-decode. Remove that input, and repeat.

Even that won't find the maximal amount of decoding that is possible; for that you have to slide through the input bit by bit and try decoding at every bit position.

However, it seems fair because you wouldn't disassemble that way. If you disassemble some bytes successfully, you skip past those and keep going.