These OCR improvements will almost certainly be brought to google books, which is great. Long term it can enable compressing all non-digital rare books into a manageable size that can be stored for less than $5,000.[0] It would also be great for archive.org to move to this from Tesseract. I wonder what the cost would be, both in raw cost to run, and via a paid API, to do that.

[0] https://annas-archive.org/blog/critical-window.html

▲

levocardia 2 hours ago | parent | next [-]

This is a really interesting "data flywheel" -- better model >> more usable data >> even better model

▲

tills13 an hour ago | parent [-]

surely there's an upper limit to this though with models literally eating themselves.

	▲	jeffbee 35 minutes ago \| parent [-]
		When a human students learns to read more carefully we don't consider that a negative.

▲

kridsdale3 4 hours ago | parent | prev [-]

More Data for the Data Gods!