Remix.run Logo
Gigachad 3 days ago

My possibly incomplete understanding was that the original office file format was basically just raw dumps of the internal C data structures. Not designed or specified in any way.

The XML version likely carries a lot of baggage having to be compatible with that.

lmkg 3 days ago | parent [-]

They weren't "just" raw dumps of internal C structures. It takes careful design work to dump raw memory in a usable fashion. Consider: You can't just write a pointer to disk and then read it back next week.

Binary MS Office format is a phenomenal piece of engineering to achieve a goal that's no longer relevant: fast save/load on late-80's hard drives. Other programs took minutes to save a spreadsheet, Excel took seconds. It did this by making sure it's in-memory data structures for a document could be dumped straight to disk without transformation.

But yes, this approach carries a shitton of baggage. And that achievement is no longer relevant in a world where consumer hardware can parse XML documents on the fly.

I have heard it argued, though, that the "baggage" isn't the file format. It's actually the full historical featureset of Excel. Being backwards-compatible means being able to faithfully represent the features of old Excel, and the essential complexity of that far outweighs the incidental complexity of how those features were encoded.