| ▲ | mc32 8 hours ago | |||||||
True but like regular document scanning software there can be errors in detection. | ||||||||
| ▲ | dleeftink 8 hours ago | parent | next [-] | |||||||
Just as with redacted documents (consistently blocked terms) or bad OCR jobs (wrong or missing characters), even if only a certain percentage comes out unmangled it is more readable than having no data at all. A stable base corpus and some dynamic programming will allow you to clean up the remainder[0]. | ||||||||
| ||||||||
| ▲ | selcuka 2 hours ago | parent | prev [-] | |||||||
Yeah. There was a weird Xerox printer bug that swapped digits (turning 6s into 8s) on scanned documents caused by the JBIG2 image format [1]. [1] https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres... | ||||||||