Remix.run Logo
Retr0id 2 days ago

Somewhere (discussed on HN) someone devised a "better-than-perfect" compressor. Most inputs get compressed (smaller than input), except for one input that does not. That one input is cryptographically impossible to find - or something along those lines.

Unfortunately I can't find the article I'm describing here, maybe someone else can? It was a long time ago so I might be misrepresenting it slightly.

phire 15 hours ago | parent | next [-]

It can't exist.

Presume this compressor guarantees the output will always be at least one byte smaller (with the exception of the impossible to find input)

So just keep running your data in a loop through the compressor. If you start with a 1MB file, it will take a maximum of a million iterations until the output shrinks down to zero bytes, which is the smallest possible file. Record how many iterations it too.

You can now extract your file by feeding a zero byte file into the decompressor and running the same number of iterations. Which You can now store every 1MB (or smaller) file in the world in just 20 bits.... But that would means there are only 1 million possible 1MB files?

Even if you put some minimum output size limitation on the compressor, say it can't produce any file less than 512 bits, the same argument applies. It's just that the numbers get bigger.

spencerflem 2 days ago | parent | prev | next [-]

That's how all compressors work, in that likely files (eg. ASCII, obvious patterns, etc) become smaller and unlikely files become bigger.

Retr0id an hour ago | parent | next [-]

Right, but the point was, the case where it became bigger was ~impossible to find.

Dylan16807 2 days ago | parent | prev | next [-]

> likely files (eg. ASCII, obvious patterns, etc) become smaller

Likely files for a real human workload are like that, but if "most inputs" is talking about the set of all possible files, then that's a whole different ball game and "most inputs" will compress very badly.

> unlikely files become bigger

Yes, but when compressors can't find useful patterns they generally only increase size by a small fraction of a percent. There aren't files that get significantly bigger.

a day ago | parent [-]
[deleted]
PaulHoule 2 days ago | parent | prev [-]

In some cases it can be certain, the ascii encoded in the usual 8 bits has fat to trim even if it is random in that space.

Dylan16807 2 days ago | parent | prev [-]

You could design the opposite, a system where ultra-rare files get compressed a lot but most don't compress.

But as stated, what you said is not possible. To compress by a single byte, only 1/256 files can be compressible, or you'd have multiple different inputs compressing down to the same output.