| ▲ | ddtaylor 5 hours ago | |
How does this compare to Tesserect? | ||
| ▲ | newzino 3 hours ago | parent [-] | |
Different tools for different jobs. Tesseract is free, runs on CPU, and handles clean printed text well. For standard documents with simple layouts, it's hard to beat. Where it falls apart is complex pages. Multi-column layouts, tables, equations, handwriting. Tesseract works line-by-line with no understanding of page structure, so a two-column paper gets garbled into interleaved text. VLM-based models like DeepSeek treat the page as an image and infer structure visually, which handles those cases much better. For this specific use case (stats textbook with heavy math), Tesseract would really struggle with the equations. LaTeX-rendered math has unusual character spacing and stacked symbols that confuse traditional OCR engines. The author chose DeepSeek specifically because it outputs markdown with math notation intact. The tradeoff is cost and infrastructure. Tesseract runs on your laptop for free. The author spent $2 on A100 GPU time for 600 pages. For a one-off textbook that's nothing, but at scale the difference between "free on CPU" and "$0.003/page on GPU" matters. Worth noting that newer alternatives like dots and olmOCR (mentioned upthread by kbyatnal) are also worth comparing if accuracy on complex layouts is the priority. | ||