Remix clone Hacker News

new | show | ask | jobs Github

	▲	visarga 3 days ago
		The crucial information is missing - accuracy comparison with other OCR providers. From my experience LLM based OCR might misread the layout and hallucinate values, it is very subtle but sometimes critically wrong. Classical OCR has more precision but doesn't get the layout at all. Combining both has other issues, no approach is 100% reliable.
	▲	agentcoops 3 days ago \| parent \| next [-]
		Have you evaluated this lately? Last year or even just earlier this year I would have mostly agreed with you. At this point, however, with at least the documents I have been working on, OCR reliability with GPT5 or Mistral OCR [1] has been much better than even domain-trained classical OCR. If the documents have even slightly complex layout (to say nothing of page numbers or page headings or an uncommon font), the accuracy of state of the art LLMs has been in my work an order of magnitude greater. The ability to have the LLM tentatively combine trailing sentences across pages, which is especially useful if you have to work with documents in German say, is invaluable. [1] https://mistral.ai/news/mistral-ocr
	▲	WithinReason 3 days ago \| parent \| prev \| next [-]
		Breaking up the page, feeding the pieces one-by-one and reassembling the output helps with that. I was expecting this project to do that but it can only feed a whole page.
	▲	worldsayshi 3 days ago \| parent \| prev \| next [-]
		Yes I tried using LLM for reading CV:s a while back and I really struggled with getting it to not omit important information.
	▲	smusamashah 3 days ago \| parent \| prev [-]
		Any tool that takes a scanned PDF, then overlay's OCRed text over scan so that text becomes searchable?