Remix.run Logo
sieste 3 hours ago

Due to pdf popularity there is a lot of demand for pdf processing tools. And the format is so complex that there are many nontrivial and creative ways to do pdf processing. That's why these "Hello World" projects usually make Top 5 on HN, and one of the upvotes is usually from me.

forgotpwd16 3 hours ago | parent [-]

>many nontrivial and creative ways to do pdf processing

They're all wrapping PDFlib and provide the same functionality.

sam_lowry_ 3 hours ago | parent [-]

I am already well served by ghostscript, GIMP, Imagemagick, etc:

Optimize PDF:

    #!/bin/bash
    INPUT="$1"
    OUTPUT="$(mktemp --suffix=.pdf)"
    gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
    -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$OUTPUT" "$INPUT"
    mv "$OUTPUT" "$INPUT"
Merge PDF:

    #!/bin/sh
    gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
      -dCompatibilityLevel=1.3 -dPDFSETTINGS=/ebook \
      -sOutputFile=merged.pdf "$@"
And so on and so forth.

Moreover, I see a webapp and I immediately assume everything I do in this app is exfiltrated and abused.

I can check that the webapp advertised above is indeed local-first, but I can't be 100% sure they don't steal my data in a way I did not foresee, e.g. via websockets or cookies.

Because I learnt this the hard way by being on Instagram and Gmail.

ptspts 3 hours ago | parent | next [-]

Your commands to process PDF with Ghostscript are lossy (they lose lots of metadata and in minor ways they also change how the PDF renders), and they produce very large PDF files.

x3ro 3 hours ago | parent [-]

Can you expand on why the produced PDF files are supposed to be larger than the originals? I've not observed that yet.

kkfx an hour ago | parent | prev | next [-]

To better compress my personal preference is

    pdftops -paper A4 -expand -level3 file.pdf # I'm from EU, so A4 is my common paper format

    ps2pdf14 -dEmbedAllFonts=true        \
    -dUseFlateCompression=true           \
    -dOptimize=true                      \
    -dProcessColorModel=/DeviceRGB       \
    -r72                                 \
    -dDownsampleGrayImages=true          \
    -dGrayImageResolution=150            \
    -dAutoFilterGrayImages=false         \
    -dGrayImageDownsampleType=/Bicubic   \
    -dDownsampleMonoImages=true          \
    -dMonoImageResolution=150            \
    -dMonoImageDownsampleType=/Subsample \
    -dDownsampleColorImages=true         \
    -dColorImageResolution=150           \
    -dAutoFilterColorImages=false        \
    -dColorImageDownsampleType=/Bicubic  \
    -dPDFSETTINGS=/ebook                 \
    -dNOSAFER                            \
    -dALLOWPSTRANSPARENCY                \
    -dShowAnnots=false                   \
      file.ps compressed.pdf
sam_lowry_ an hour ago | parent [-]

If you do `-dEmbedAllFonts=true` then probably `-dSubsetFonts=true` would also be useful.

And for the rest, `-dPDFSETTINGS=/ebook` should already have most of the values that you set explicitely.

tedk-42 2 hours ago | parent | prev [-]

You're being downvoted because not everyone has CLI access to a server and the required ghostscript binaries etc.

Realistically, most 'normal users' have PDF needs like these links and we as tech people can safely give these sites to non-technical people and have confidence their data isn't being stolen on remote dodgy servers (think gas / electricity bills, invoices, bank statements etc which is a PII gold pot).

sam_lowry_ an hour ago | parent [-]

Server? what server? Ghostscript is available in virtually any Linux distro, on Mac with and without brew and even on Windows.

I have no confidence in any website, especially the one that claims to be local-only but can technically change on a whim of the developer once it starts getting enough traffic from users.

OTOH, I trust 30+ years old software sitting on on my hard drive not to phone home on every keystroke.