| ▲ | staplung 2 hours ago | |||||||
This has been possible with the zlib module since 1997 [EDIT: zlib is from '97. The zdict param wasn't added until 2012]. You even get similar byte count outputs to the example and on my machine, it's about 10x faster to use zlib. | ||||||||
| ▲ | notpushkin 2 hours ago | parent [-] | |||||||
True. The post calls out that “you have to recompress the training data for each test document” with zlib (otherwise input_text would taint it), but you can actually call Compress.copy(). zdict was added in Python 3.3, though, so it’s 2012, not 1997. (It might have worked before, just not a part of the official API :-) | ||||||||
| ||||||||