▲ | pronoiac 6 days ago | |||||||
I made a high-quality scan of PAIP (Paradigms of Artificial Intelligence Programming), and worked on OCR'ing and incorporating that into an admittedly imperfect git repo of Markdown files. I used Scantailor to deskew and do other adjustments before applying Tesseract, via OCRmyPDF. I wrote notes for some of my process over at https://github.com/norvig/paip-lisp/releases/tag/v1.2 . I'd also tried ocrit, which uses Apple's Vision framework for OCR, with some success - https://github.com/insidegui/ocrit It's an ongoing, iterative process. I'll watch this thread with interest. Some recent threads that might be helpful: * https://news.ycombinator.com/item?id=42443022 - Show HN: Adventures in OCR * https://news.ycombinator.com/item?id=43045801 - Benchmarking vision-language models on OCR in dynamic video environments - driscoll42 posted some stats from research * https://news.ycombinator.com/item?id=43043671 - OCR4all (Meaning, I have these browser tabs open, I haven't fully digested them yet) | ||||||||
▲ | lherron 6 days ago | parent | next [-] | |||||||
Also this: https://news.ycombinator.com/item?id=42952605 - Ingesting PDFs and why Gemini 2.0 changes everything | ||||||||
▲ | kingkongjaffa 6 days ago | parent | prev [-] | |||||||
Was technology the right approach here? Is it essentially done now? I couldn’t tell if it was completed entirely. I can’t help but think a few amateur humans could have read the pdf with their eyes and written the markdown by hand if the OCR was a little sketchy. | ||||||||
|