Remix.run Logo
pwagland 6 hours ago

PDF files can be signed, that is tamper resistance. Tamper resistance doesn't have to make any difference to the readability of the document.

kube-system 6 hours ago | parent | next [-]

So can any type of file -- that doesn't have any relevance to the supposed design of every file type in existence. Now, later versions of PDF do have explicit support for signatures, but what does this have to do with preventing OCR? OCR reads a file, it doesn't change the original file.

ranger_danger 6 hours ago | parent | next [-]

Some OCR solutions do change the original file, like OCRmyPDF. They take layers that were just images before and replace it with text layers so that you can search the document.

kube-system 6 hours ago | parent [-]

That isn't OCR, but an application of the resulting output of OCR. Again, a signature on a PDF or any type of file doesn't prevent you from reading it. (It also doesn't technically prevent you from changing it, it just enables the detection of changes to a particular file.)

There's nothing about PDFs or image formats that prevent anyone from doing OCR. The reason construction documents are difficult to OCR is because OCR models are not well trained for them, and they're very technical documents where small details are significant. It doesn't have anything to do with the file format

fithisux 5 hours ago | parent | prev [-]

True but you can make modified copies if you reverse engineer it with OCR.

jimjimjim an hour ago | parent [-]

That's not really what I would call reverse engineering. If you read a pdf, and type it into word is that reverse engineering? Either way whatever you get is in no way going to convince anybody that it is the original.

ranger_danger 6 hours ago | parent | prev [-]

Can't one just remove the signature and re-sign it with anything else after tampering? Who verifies PDFs that hard?

kube-system 5 hours ago | parent [-]

If you're performing OCR, you're almost by definition, disregarding the source file. The whole point of OCR is to be transformative.