| ▲ | shbooms 5 hours ago | |
often times you will have requirements that the documents you release be digitally searchable and so in these cases, this would not be an option | ||
| ▲ | pottertheotter 4 hours ago | parent | next [-] | |
This made me think of something I came across recently that’s almost the opposite problem of requiring PDFs to be searchable. A local government would publish PDFs where the text is clearly readable on screen, but the selectable text layer is intentionally scrambled, so copy/paste or search returns garbage. It's a very hostile thing to do, especially with public data! | ||
| ▲ | 8note 5 hours ago | parent | prev [-] | |
run some ocr on them after to recreate the text layer? | ||