It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure, reducing the cost similar to how you can crack passwords when the server doesn't use a constant-time memcmp. Are PDFs typically compressed by default? If so that makes it even easier given built-in checksums. But it's just not something you can do by throwing data at existing tools. You'll need to build a testing harness with instrumentation deep in the bowels of the decoders. This kind of work is the polar opposite of what AI code generators or naive scripting can accomplish.

▲

JKCalhoun 21 days ago | parent | next [-]

Not necessarily a PDF attachment?

Someone who made some progress on one Base64 attachment got some XMP metadata that suggested a photo from an iPhone. Now I don't know if that photo was itself embedded in a PDF, but perhaps getting at least the first few hundred bytes decoded (even if it had to be done manually) would hint at the file-type of the attachment. Then you could run your tests for file fidelity.

	▲	swsieber 21 days ago \| parent [-]
		I'd say 99% of the time, the first 10 bytes would be enough to know the file type.

▲

cluckindan 22 days ago | parent | prev | next [-]

On the contrary, that kind of one-off tooling seems a great fit for AI. Just specify the desired inputs, outputs and behavior as accurately as possible.

	▲	m000 21 days ago \| parent [-]
		You might be taking the "I" in AI too literally.

▲

sznio 22 days ago | parent | prev | next [-]

>It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure

that's pointed out in the article. It's easy for plaintext sections, but not for compressed sections. Didn't notice any mention of checksums.

▲

pimlottc 22 days ago | parent | prev [-]

I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on. I’m sure there’s got to be one for PDF generation.