Remix.run Logo
pkilgore a day ago

To me the most interesting thing here isn't that you can compress something better by removing randomly-distributed semantically-meaningless information. It's why zstd --long does so much better than gzip when you do and the default does worse than gzip.

What lessons can we take from this?

cogman10 21 hours ago | parent [-]

Why it does worse than gzip isn't something that I know. Why --long is so efficient is likely a result of evolution of all things :). A lot of things have common ancestors which means shared genetic patterns across species. --long allows zstd to see a 2gb window of data which means it's likely finding all those genetic similarities across species.

Endogenous retroviruses [1] are interesting bits of genetics that helps link together related species. A virus will inject a bit of it's genetics into the host which can effectively permanently scar the host's DNA and all their offspring's DNA.

[1] https://en.wikipedia.org/wiki/Endogenous_retrovirus