▲ | pkilgore a day ago | |
To me the most interesting thing here isn't that you can compress something better by removing randomly-distributed semantically-meaningless information. It's why zstd --long does so much better than gzip when you do and the default does worse than gzip. What lessons can we take from this? | ||
▲ | cogman10 21 hours ago | parent [-] | |
Why it does worse than gzip isn't something that I know. Why --long is so efficient is likely a result of evolution of all things :). A lot of things have common ancestors which means shared genetic patterns across species. --long allows zstd to see a 2gb window of data which means it's likely finding all those genetic similarities across species. Endogenous retroviruses [1] are interesting bits of genetics that helps link together related species. A virus will inject a bit of it's genetics into the host which can effectively permanently scar the host's DNA and all their offspring's DNA. |