| ▲ | dleeftink 8 hours ago | |
Just as with redacted documents (consistently blocked terms) or bad OCR jobs (wrong or missing characters), even if only a certain percentage comes out unmangled it is more readable than having no data at all. A stable base corpus and some dynamic programming will allow you to clean up the remainder[0]. | ||
| ▲ | mkl 6 hours ago | parent [-] | |
The problem is when you can't tell which bits are unmangled. OCR systems will happily give you plausible but wrong readings, and even some scanners/copiers will change things: https://dkriesel.com/en/blog/2013/0802_xerox-workcentres_are... | ||