Remix.run Logo
etothepii 3 days ago

I spent a lot of time last year replicating every valid Excel number format. I've really struggled to find good documentation on the excel format when you really get into the weeds.

The use of namespaces is also incredibly annoying in so far as I can tell in every xml library I can find they really aren't well supported for that "human" readable component.

When you crack open the file it feels like you are going to be able to find everything you need with an xpath like //w:t but none of the xml parsers I've found cope well with the namespaces.

rhdunn 3 days ago | parent [-]

What language?

In Python, the `find`, `findall`, etc. methods take a namespace dictionary. E.g.

   result = doc.findall("//w:t", namespaces={"w": "..."})
In C# you can do:

    var navigator = doc.Root!.CreateNavigator();
    nsManager = new XmlNamespaceManager(navigator.NameTable);
    nsManager.AddNamespace("w", "...");
    var results = doc.Root?.XPathSelectElements("//w:t", nsManager);
In Java you need to enable a namespace-aware flag in the settings to get namespaces to work. I can't recall off-hand how to do that.
etothepii 2 days ago | parent [-]

Yes I'm talking Python. The namespaces are long, dare I say ugly, urls. Even though the xml file itself uses `<w:t` you can't unless you provide a dictionary as an argument that contains `{"w":"https://...."}` which means that you need to do a bunch of reading and understanding before you can start playing with the files.