Remix.run Logo
wahern 22 days ago

It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure, reducing the cost similar to how you can crack passwords when the server doesn't use a constant-time memcmp. Are PDFs typically compressed by default? If so that makes it even easier given built-in checksums. But it's just not something you can do by throwing data at existing tools. You'll need to build a testing harness with instrumentation deep in the bowels of the decoders. This kind of work is the polar opposite of what AI code generators or naive scripting can accomplish.

JKCalhoun 21 days ago | parent | next [-]

Not necessarily a PDF attachment?

Someone who made some progress on one Base64 attachment got some XMP metadata that suggested a photo from an iPhone. Now I don't know if that photo was itself embedded in a PDF, but perhaps getting at least the first few hundred bytes decoded (even if it had to be done manually) would hint at the file-type of the attachment. Then you could run your tests for file fidelity.

swsieber 21 days ago | parent [-]

I'd say 99% of the time, the first 10 bytes would be enough to know the file type.

cluckindan 22 days ago | parent | prev | next [-]

On the contrary, that kind of one-off tooling seems a great fit for AI. Just specify the desired inputs, outputs and behavior as accurately as possible.

m000 21 days ago | parent [-]

You might be taking the "I" in AI too literally.

sznio 22 days ago | parent | prev | next [-]

>It should be much easier than that. You should should be able to serially test if each edit decodes to a sane PDF structure

that's pointed out in the article. It's easy for plaintext sections, but not for compressed sections. Didn't notice any mention of checksums.

pimlottc 22 days ago | parent | prev [-]

I wonder if you could leverage some of the fuzzing frameworks tools like Jepsen rely on. I’m sure there’s got to be one for PDF generation.