Remix.run Logo
duskwuff 6 hours ago

Concur. Zstandard is a good compressor, but it's not magical; comparing the compressed size of Zstd(A+B) to the common size of Zstd(A) + Zstd(B) is effectively just a complicated way of measuring how many words and phrases the two documents have in common. Which isn't entirely ineffective at judging whether they're about the same topic, but it's an unnecessarily complex and easily confused way of doing so.

srean 29 minutes ago | parent | next [-]

I do not know inner details of Zstandard, but I would expect that it to least do suffix/prefix stats or word fragment stats, not just words and phrases.

D-Machine 2 hours ago | parent | prev [-]

Yup. Data compression ≠ semantic compression.