Remix.run Logo
ajross 21 hours ago

PDF files are for storing fixed (!!) output of printed/printable material. That's where the format's roots are via Postscript, it's where the format found its main success in document storage, and it's the metaphor everyone has in mind when using the format.

PDFs don't change. PDFs are what they look like.

Except they aren't, because Adobe wanted to be able to (ahem) "annotate" them, or "save changes" to them. And Adobe wanted this because they wanted to sell Acrobat to people who would otherwise be using MS Word for these purposes.

And in so doing, Adobe broke the fundamental design paradigm of the format. And that has had (and continues to have, to hilarious effect) continuing security impact for the data that gets stored in this terrible format.

detourdog 18 hours ago | parent | next [-]

When Acrobat came out cross platform was not common. Being able to publish a document that could be opened on multiple platforms was a big advantage. I was using it to distribute technical specifications in the mid 90's. Different pages of these specifications came from, Filemaker, Excel, Word, Mini-Cad, Photoshop, Illustrator, and probably other applications as well. We would combine these into a single PDF file. This simplified version control. This also meant that bidders could not edit the specifications.

None of that could be accomplished with Word alone. I think you are underestimating the qualities of PDF for distribution of complex documents.

ajross 15 hours ago | parent [-]

> This also meant that bidders could not edit the specifications.

But they can! That's the bug, PDF is a mutable file format owing to Adobe's muckery. And you made the same mistake that every government redactor and censor (up to and including the ?!@$! NSA per the linked article) has in the intervening decades.

The file format you thought you were using was a great fit for your problem, and better than MS Word. The software Adobe shipped was, in fact, something else.

detourdog 3 hours ago | parent [-]

Our use wasn't that high stakes. I also can't remember all the details but were using 3rd party publishing tools beyond Acrobat. Acrobat was the reader that our specifications could be opened in. We had to use additional software and to add consistent headers and get accurate pager numbers.

ogurechny 13 hours ago | parent | prev [-]

It started in the '80s. PostScript was the big deal. It was a printer language, not a document language. It was not limited to “(mostly) text documents”, even though complex vector fonts and even hinting were introduced. For example, you could print some high quality vector graphs in native printer resolution from systems which would never ever get enough memory to rasterise such giant bitmaps, by writing/exporting to PostScript. That's where Adobe's business was. See also NeWS and NeXT.

However, arbitrary non-trivial PostScript files were of little use to people without a hardware or software rasteriser (and sometimes fonts matching the ones the author had, and sometimes the specific brand of RIP matching the quirks of authoring software, etc.), so it was generally used by people in publishing or near it. PDF was an attempt to make a document distribution format which was more suitable to more common people and more common hardware (remember the non-workstation screen resolutions at the time). I doubt that anyone imagined typical home users writing letters and bulletins in Acrobat, of all things (though it does happen). It would be similar to buying Photoshop to resize images (and waiting for it to load each time). Therefore, competitor to Word it was not. Vice versa, Word file was never considered a format suitable for printing. The more complex the layout and embedded objects, the less likely it would render properly on publisher's system (if Microsoft Office did exist for its architecture at all). Moreover, it lacked some features which were essential for even small scale book publishing.

Append-only or versioned-indexed chunk-based file formats for things we consider trivial plain data today were common at the time. Files could be too big to rewrite completely each time even without edits, just because of disk throughput and size limits. The system could not be able to load all of the data into memory because of addressing or size limitations (especially when we talk about illustrations in resolutions suitable for printing). Just like modern games only load the objects in player's vicinity instead of copying all of the dozens or hundreds of gigabytes into memory, document viewers had to load objects only in the area visible on screen. Change the page or zoom level, and wait until everything reloads from disk once again. Web browsers, for example, handle web pages of any length in the same fashion. I should also remind you that default editing mode in Word itself in the '90s was not set to WYSIWYG for similar performance reasons. If you look at the PDF object tree, you can see that some properties are set on the level above the data object, and that allows overwriting the small part of the index with the next version to change, say, position without ever touching the chunk in which the big data itself stays (because appending the new version of that chunk, while possible, would increase the file size much more).

Document redraw speed can be seen in this random video. But that's 1999, and they probably got a really well performing system to record the promotional content. https://www.youtube.com/watch?v=Pv6fZnQ_ExU

PDF is a terrible format not because of that, but because its “standard” retroactively defined everything from the point of view of Acrobat developer, and skipped all the corner cases and ramifications (because if you are an Acrobat developer, you define what is a corner case, and what is not). As a consequence, unless you are in a closed environment you control, the only practical validator for arbitrary PDFs is Acrobat (I don't think that happened by chance). The external client is always going to say “But it looks just fine on my screen”.