Remix.run Logo
nneonneo 3 days ago

Microsoft seems to have known that they could ram basically anything through a standards body, so they presumably didn't bother to actually try and simplify the standard. Instead, it's basically an XML serialization of their older binary formats, complete with all of the quirks and bugs that have to be emulated for 100% compatibility.

To be fair, we're talking about a product line with over 35 years of history here. Cruft in the format builds up but can never be removed, so long as you commit to strong backwards compatibility - which Microsoft has always done.

Fun trivia: many of the old binary formats use a meta-format called OLE2 (Object Linking and Embedding). The file format is a FAT12 filesystem packed into a single file, with a FAT filesystem chain, file blocks aligned to a specific power-of-two size, etc. This made saving files very fast, but raised the possibility of internal fragmentation (where individual sub-files are scattered over many non-contiguous blocks); hence, users were recommended to "Save As..." periodically for large/complex files to optimize the internal storage.

rtpg 3 days ago | parent | next [-]

"You have to standardize the format"

"OK we will standardize our serialization format"

It's... I guess malicious compliance, though also if you don't care about interop you're not going to try to abstract away your internal application structures, are you!

I appreciate the standard existing rather than it not existing. Trying to have the standard exist in this way has always felt like an uphill battle, and at least now there's _something_.

Just you will have a better time if you emulate how Office does things. But you have a bit more documentation to go along with it.

flomo 3 days ago | parent | prev [-]

Officially now MS-CFB (i think). OLE2 generally refers to a predecessor to COM, and not just the file format.

https://learn.microsoft.com/en-us/openspecs/windows_protocol...

masfuerte 3 days ago | parent [-]

Being pedantic, OLE1 was the predecessor. OLE2 used COM for its plumbing.

Wikipedia has an article on the file format [1]. It was quite nice. It works like an uncompressed zip file with transactional updates.

Earlier Word document formats were much worse. They were a dump of Word's memory contents. Saving and loading was very quick though!

[1]: https://en.wikipedia.org/wiki/Compound_File_Binary_Format