| ▲ | bambax 3 days ago | |
OCR is fascinating; I did some experiments on OCR for an ancient French book that made it to HN last year: https://news.ycombinator.com/item?id=42443022 I found that at the time no LLM was able to properly organize the text and understand footnotes structure, but non-AI OCR works very well, and restructuring (with some manual input) is largely feasible. Would be interested in what you can do with those footnotes (including, for good measure, footnotes-within-footnotes). Regarding feeding text to LLMs, it seems they are often able to make sense of text when the layout follows the original, which means the OCR phase doesn't necessarily need to properly understand the structure of the source: rendering the text in a proper layout can be sufficient. I worked on setting up a service that would do just that, but in the end didn't go live with it; but here's the examples page to show what I mean: https://preview.adgent.com/#examples This approach is very straightforward and fails rarely. | ||