Remix.run Logo
rini17 4 days ago

This might in general be a good preprocessing step to check for punctuation repeating in fixed intervals and remove it, and restore after decompression.

vintermann a day ago | parent | next [-]

That turns in into specialized compression, which DNA already has plenty of. Many forms of specialized compression even allow string-related queries directly on the compressed data.

rini17 14 minutes ago | parent [-]

There are plenty of data formats where data is interspersed with fixed delimiters in fixed intervals.

bede 3 days ago | parent | prev [-]

Yes, it sounds like 7-Zip/LZMA can do this using custom filters, among other more exotic (and slow) statistical compression approaches.