| ▲ | notpushkin 3 hours ago | |
> The application of compressors for text statistics is fun, but it's a software equivalent of discovering that speakers and microphones are in principle the same device. I think it makes sense to explore it from practical standpoint, too. It’s in Python stdlib, and works reasonably well, so for some applications it might be good enough. It’s also fairly easy to implement in other languages with zstd bindings, or even shell scripts: | ||
| ▲ | notpushkin 2 hours ago | parent [-] | |
Or with the newsgroup20 dataset:
Output:
Pretty neat IMO. | ||