| ▲ | lalitmaganti 8 hours ago | |
Interesting! You might want to try Tabula in that case. For that type of "obfuscated" PDFs I've come across, it does well, it's just a lot slower to run than pdf2text. | ||
| ▲ | Macha 8 hours ago | parent [-] | |
It appears Tabula also gets the substituted content instead. What I'm seeing is that for example, POS is substituted to & !ë on every line in every file, etc. I can see by comparing to the rendered PDF for other common text (like my name, the local supermarket, etc) that those all seem to be 1:1 substitutions too. | ||