| ▲ | blueflow 2 days ago | ||||||||||||||||
Link on that, as OCR should be more reliable with Times New Roman due to significant serifs. | |||||||||||||||||
| ▲ | orwin 2 days ago | parent | next [-] | ||||||||||||||||
I don't have link on that, but the main difficulty with OCR isn't the OCR part (not anymore at least), it's the "clean up" part, and serifs are a pain in the ass, especially on sightly crumpled paper. My use case was an ERP plugin that digitalized and read to receipt to autofill reimbursement demands, and since most receipt use sans-serif fonts, it was mostly fine, but some jokers use serifed font (mostly on receipts you get when using cash, not credit card receipts) and the error rate jumped from like 1% to 13% (not sure about the 1%, it might be a story i told myself to make me feel better, it was a decade ago, before i pivoted to network from AI. I always take the best decision it seems) | |||||||||||||||||
| ▲ | nerevarthelame 2 days ago | parent | prev | next [-] | ||||||||||||||||
I don't know what studies Blinken's State Department considered, but here are 2 studies on the matter. https://www.academia.edu/72263493/Effect_of_Typeface_Design_...: "For Latin, it was observed that individual letters with serif cause misclassification on (b,h), (u,n), (o,n), (o,u)." https://par.nsf.gov/servlets/purl/10220037: [Figure 5 shows higher accuracy for the two sans-serif fonts, Arial and DejaVu compared to Times New Roman, across all OCR engines] | |||||||||||||||||
| ▲ | papercrane 2 days ago | parent | prev [-] | ||||||||||||||||
The memo at the time said the serifs can cause OCR issues. | |||||||||||||||||
| |||||||||||||||||