Remix.run Logo
freedomben 3 hours ago

I can't say for project Gutenberg specifically, but in general a huge issue I see is OCR errors. What do you all do to address OCR?

gluejar 3 hours ago | parent | next [-]

Check out Distributed Proofreaders: https://pgdp.net

jfengel 15 minutes ago | parent [-]

I didn't realized DP was still around. I used to do it quite a bit, 15 years ago, but OCR has improved considerably since then.

lapetitejort 3 hours ago | parent | prev [-]

I uploaded a PDF to archive.org that auto-OCRs with plenty of mistakes. I have found no way of updating the entire stack of documents produced. I wonder if Project Gutenberg is similar