| ▲ | Terr_ 6 hours ago |
| > OCR for construction documents does not work I'm reminded of the Xerox JBIG2 bug back in ~2013, where certain scan settings could silently replace numbers inside documents, and bad construction-plans were one of the cases that led to it being discovered. [0] It wasn't overt OCR per se, end-user users weren't intending to convert pixels to characters or vice-versa. [0] https://www.youtube.com/watch?v=c0O6UXrOZJo&t=6m03s |
|
| ▲ | TehCorwiz 6 hours ago | parent | next [-] |
| If I recall it was an artifact of the compression algo. Full context and details: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres... |
|
| ▲ | hackcasual 3 hours ago | parent | prev [-] |
| JBIG2 does glyph binning, as you say not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replaced with a reference to a single instance. |
| |
| ▲ | thaumasiotes an hour ago | parent [-] | | > not exactly OCR, but similar. So chunks of the image that look sufficiently similar get replaced with a reference to a single instance. How can we describe OCR that wouldn't match this definition exactly? | | |
| ▲ | Dylan16807 an hour ago | parent [-] | | Jbig2 dynamically pulls reference chunks out of the image, which makes it more likely to have insufficient separation between the target shapes. It also gives a false sense of security when it displays dirty pixels that still clearly show a specific digit, since you think you're basically looking at the original. | | |
| ▲ | thaumasiotes 43 minutes ago | parent [-] | | That's a description of Jbig2, not a description of OCR. Jbig2 is an OCR algorithm that doesn't assume the document comes from a pre-existing alphabet. | | |
| ▲ | Dylan16807 20 minutes ago | parent [-] | | You asked what the difference was, and I said the difference. Was it unclear that to fit the phrasing of your question, we add "OCR doesn't"? I would not personally call Jbig2 OCR. |
|
|
|
|