| ▲ | wahern 22 days ago | |||||||
It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure, reducing the cost similar to how you can crack passwords when the server doesn't use a constant-time memcmp. Are PDFs typically compressed by default? If so that makes it even easier given built-in checksums. But it's just not something you can do by throwing data at existing tools. You'll need to build a testing harness with instrumentation deep in the bowels of the decoders. This kind of work is the polar opposite of what AI code generators or naive scripting can accomplish. | ||||||||
| ▲ | JKCalhoun 21 days ago | parent | next [-] | |||||||
Not necessarily a PDF attachment? Someone who made some progress on one Base64 attachment got some XMP metadata that suggested a photo from an iPhone. Now I don't know if that photo was itself embedded in a PDF, but perhaps getting at least the first few hundred bytes decoded (even if it had to be done manually) would hint at the file-type of the attachment. Then you could run your tests for file fidelity. | ||||||||
| ||||||||
| ▲ | cluckindan 22 days ago | parent | prev | next [-] | |||||||
On the contrary, that kind of one-off tooling seems a great fit for AI. Just specify the desired inputs, outputs and behavior as accurately as possible. | ||||||||
| ||||||||
| ▲ | sznio 22 days ago | parent | prev | next [-] | |||||||
>It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure that's pointed out in the article. It's easy for plaintext sections, but not for compressed sections. Didn't notice any mention of checksums. | ||||||||
| ▲ | pimlottc 22 days ago | parent | prev [-] | |||||||
I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on. I’m sure there’s got to be one for PDF generation. | ||||||||