| ▲ | privatelypublic 3 days ago |
| If you think 1990's ocr- even 2000's OCR is remotely as good as modern OCR... I`v3 g0ta bnedge to sell. |
|
| ▲ | skygazer 3 days ago | parent | next [-] |
| I had an on-screen OCR app on my Amiga in the early 90s that was amazing, so long as the captured text image used a system font. Avoiding all the mess of reality like optics, perspective, sensors and physics and it could be basically perfect. |
| |
| ▲ | privatelypublic 3 days ago | parent | next [-] | | If you want to go back to the start, look up MICR. Used to sort checks. OCR'ing a fixed, monospaced, font from a pristine piece of paper really is "solved." It's all the nasties of tue real world that its an issue. As I mockingly demonstrated- kerning, character similarity, grammar, lexing- all present large and hugely time consuming problems to solve in processes where OCR is the most useful. | |
| ▲ | Someone 3 days ago | parent | prev [-] | | MacPaint had that in 1983, but it never shipped because Bill Atkinson “was afraid that if he left it in, people would actually use it a lot, and MacPaint would be regarded as an inadequate word processor instead of a great drawing program” (https://www.folklore.org/MacPaint_Evolution.html) Also shows a way to do that fast: “ First, he wrote assembly language routines to isolate the bounding box of each character in the selected range. Then he computed a checksum of the pixels within each bounding box, and compared them to a pre-computed table that was made for each known font, only having to perform the full, detailed comparison if the checksum matched.” |
|
|
| ▲ | bayindirh 3 days ago | parent | prev | next [-] |
| Tesseract can do wonders for scanned paper (and web generated PDFs) both in its old and new version. If you want to pay for something closed, Prizmo on macOS is extremely good as well. On the other hând, LLm5 are sl0wwer, moré resource hangry and l3ss accurale fr their outpu1z. We shoulD stop gl0rıfying LLMs for 3verylhin9. |
| |
| ▲ | agentcoops 3 days ago | parent [-] | | I've worked extensively with Tesseract, ABBYY, etc in a personal and professional context. Of course they work well for English-language documents without any complexity of layout that are scanned without the slightest defect. At this point, based on extensive testing for work, state of the art LLMs simply have better accuracy -- and an order of magnitude so if you have non-English documents with complex layouts and less than ideal scans. I'll give you speed, but the accuracy is so much greater (and the need for human intervention so much less) that in my experience it's a worthwhile trade-off. I'm not saying this applies to you, but my sense from this thread is that many are comparing the results of tossing an image into a free ChatGPT session with an "OCR this document" prompt to a competent Tesseract-based tool... LLMs certainly don't solve any and every problem, but this should be based on real experiments. In fact, OCR is probably the main area where I've found them to simply be the best solution for a professional system. | | |
| ▲ | privatelypublic 3 days ago | parent [-] | | Yea. As usual, I inarticulately didn't make a good argument for my point. A tuned system with optimized workflow will by far have the best results. And- maybe llms will be a key resource in bringing the OCR into usable/profitable areas. But, theres also a ton of "I don't want to deal with this" type work items that can't justify a full workflow process build out- but that LLMs get near enough to perfect to be "good enough." The bad part is, the LLMs don't explain to people the kinds of mistakes to expect from them. |
|
|
|
| ▲ | throwaway1777 3 days ago | parent | prev [-] |
| [dead] |