| ▲ | papercrane 2 days ago |
| One of the reasons Calibri was selected over Times New Roman was it has a lower rate of OCR transcription errors, making documents using it easier for people using screen readers. |
|
| ▲ | blueflow 2 days ago | parent | next [-] |
| Link on that, as OCR should be more reliable with Times New Roman due to significant serifs. |
| |
| ▲ | orwin 2 days ago | parent | next [-] | | I don't have link on that, but the main difficulty with OCR isn't the OCR part (not anymore at least), it's the "clean up" part, and serifs are a pain in the ass, especially on sightly crumpled paper. My use case was an ERP plugin that digitalized and read to receipt to autofill reimbursement demands, and since most receipt use sans-serif fonts, it was mostly fine, but some jokers use serifed font (mostly on receipts you get when using cash, not credit card receipts) and the error rate jumped from like 1% to 13% (not sure about the 1%, it might be a story i told myself to make me feel better, it was a decade ago, before i pivoted to network from AI. I always take the best decision it seems) | |
| ▲ | nerevarthelame 2 days ago | parent | prev | next [-] | | I don't know what studies Blinken's State Department considered, but here are 2 studies on the matter. https://www.academia.edu/72263493/Effect_of_Typeface_Design_...: "For Latin, it was observed that individual letters with serif cause misclassification on (b,h), (u,n), (o,n), (o,u)." https://par.nsf.gov/servlets/purl/10220037: [Figure 5 shows higher accuracy for the two sans-serif fonts, Arial and DejaVu compared to Times New Roman, across all OCR engines] | |
| ▲ | papercrane 2 days ago | parent | prev [-] | | The memo at the time said the serifs can cause OCR issues. https://x.com/John_Hudson/status/1615486871571935232 | | |
| ▲ | opo 2 days ago | parent [-] | | Just because they claimed it, doesn't make it true. OCR and screen reader software in 2023 did not have problems with serifs. | | |
|
|
|
| ▲ | carlosjobim 2 days ago | parent | prev [-] |
| That doesn't make much sense, since a typewriter will neither type Calibri nor Times New Roman. And OCR should only be needed for type written documents, because any document made with Calibri or TNR is already digital. |
| |
| ▲ | contact9879 2 days ago | parent | next [-] | | printed documents, images, horribly inaccessible pdfs, horribly inaccessible websites | | |
| ▲ | carlosjobim 2 days ago | parent [-] | | > Printed documents
- Use the original, which is digital. > Images
- Use the original, which is digital. > horribly inaccessible pdfs
- Use the original, which has real text in the PDF > horribly inaccessible websites
- All text on any web site is digital. Nobody uses OCR on a website. A massive paper producer like the government shouldn't adopt their type setting to people who are using technology wrongly. | | |
| |
| ▲ | funnybeam 2 days ago | parent | prev [-] | | We have a process at work where clients export information from their database as a pdf which they email to us so that we can ocr it and insert into our database. No one else seems to think this is bat shit insane |
|