| ▲ | dperfect 21 days ago | |||||||
Letting Claude work a little longer produced this behemoth of a script (which is supposed to be somewhat universal in correcting similar OCR'd PDFs - not yet tested on any others though): https://pastebin.com/PsaFhSP1 which uses this Rust zlib stream fixer: https://pastebin.com/iy69HWXC and gives the best output I've seen it produce: https://imgur.com/itYWblh This is using the same OCR'd text posted by commenter Joe. | ||||||||
| ▲ | daveguy 21 days ago | parent [-] | |||||||
> which is supposed to be somewhat universal in correcting similar OCR'd PDFs Xerox would like a word. https://news.ycombinator.com/item?id=29223815 Point being, "correcting" to "correct looking" may be worse than just accepting errors. Errors are often clearly identified by humans as a nonsense word. "Correcting" OCR can result in plausible, but wrong results that are more difficult for the human in the loop to identify. | ||||||||
| ||||||||