Remix clone Hacker News

new | show | ask | jobs Github

	▲	agentcoops 3 days ago
		100%. My sense is that many in this thread have never gone through the misery of trying to use classical OCR for non-English documents or where you can't control scan quality. I did a test recently with 18th-century German documents, written in a well-known and standardized but archaic script. The accuracy of classical models specifically trained on this corpus was an order of magnitude lower than GPT5. I haven't experimented personally or professionally with smaller models, but your experience makes me hopeful that we might even get this accurate OCR on phones sooner rather than later...
	▲	bugglebeetle 3 days ago \| parent [-]
		William Mattingly has been doing a lot of work on similar documents in an archival context with VLLMs. You should check in on their work: https://x.com/wjb_mattingly https://github.com/wjbmattingly