Remix.run Logo
kevin_thibedeau 22 days ago

pdftoppm and Ghostscript (invoked via Imagemagick) re-rasterize full pages to generate their output. That's why it was slow. Even worse with a Q16 build of Imagemagick. Better to extract the scanned page images directly with pdfimages or mutool.

Followup: pdfimages is 13x faster than pdftoppm

masfuerte 21 days ago | parent [-]

This. Not only is it faster, the images are likely to be of better quality. If you rasterize the pages then the images will be scaled, unless you get very lucky.