Remix clone Hacker News

	▲	We Benchmarked Mistral and Landing AI vs. Docsumo(docsumo.com)
		1 points by snehanairdoc 7 days ago
		We recently ran a benchmark comparing Docsumo’s in-house OCR engine (which powers our Intelligent Document Processing platform) against Mistral OCR and Landing AI’s agentic document extraction. Why? There's a lot of hype around generative OCR tools lately, and we were curious how they actually perform on real-world documents—things like messy invoices, noisy scans, bank statements, multilingual forms, and low-res PDFs. What we tested: >Text extraction quality (layout preservation, character-level accuracy) >Performance on tables, charts, vertically aligned text >Impact on downstream tasks like structured data extraction (using GPT-4o) >Latency comparisons What we found: Docsumo's native OCR consistently outperformed both Mistral and Landing AI across layout fidelity, hallucination rate, and downstream extraction accuracy. While Mistral is fast and cheap, it missed large content chunks and hallucinated on noisy inputs. Landing AI did better but often paraphrased or mislabeled data, especially in multilingual or structured formats. We’ve made the benchmark results public (including side-by-side OCR outputs): https://huggingface.co/spaces/docsumo/ocr-results And here’s the full breakdown with methodology, dataset, and observations: https://www.docsumo.com/blogs/ocr/docsumo-ocr-benchmark-repo... This is just the first benchmark in a larger series—we’re working on evaluating more IDP/OCR tools next. If you're building workflows around document automation or curious about OCR performance today, would love your thoughts/feedback.