▲ | rahimnathwani 4 days ago | |
The PDFs this process creates use MRC (Mixed Raster Content), which separates each page into multiple layers: a black and white foreground layer for text/line art, a color background layer for images/colors, and a binary mask layer that controls how they're combined. This smart layering is why you can get such small file sizes while maintaining crisp text and reasonable image quality. If you want purely black and white output (e.g. if the PDF has yellowing pages and/or not-quite-black text, but doesn't have many illustrations), you can extract just the monochrome foreground layer from each page and ignore the color layers entirely. First, extract the images using mutool extract in.pdf Then delete the sRGB images. Then combine the remaining images with imagemagick command line: convert -negate *.png out.pdf This gives you a clean black and white PDF without any of the color information or artifacts from the background layer. Here's a script that does all that. It worked with two different PDFs from IA. I haven't tested it with other sources of MRC PDFs. The script depends on mutool and imagemagick. https://gist.github.com/rahimnathwani/44236eaeeca10398942d2c... |