Remix.run Logo
StopDisinfo910 2 days ago

> Calling XML human readable is a stretch.

That’s always been the main flaw of XML.

There are very few use case where you wouldn’t be better served by an equivalent more efficient binary format.

You will need a tool to debug xml anyway as soon as it gets a bit complex.

bayindirh 2 days ago | parent | next [-]

A simple text editor of today (Vim, KATE) can real-time sanity check an XML file. Why debug?

StopDisinfo910 2 days ago | parent [-]

Because issue with XML are pretty much never sanity check. After all XML is pretty much never written by hand but by tools which will most likely produce valid xml.

Most of the time you will actually be debugging what’s inside the file to understand why it caused an issue and find if that comes from the writing or receiving side.

It’s pretty much like with a binary format honestly. XML basically has all the downside of one with none of the upside.

bayindirh 2 days ago | parent [-]

I mean, I found it pretty trivial to write parsers for my XML files, which are not simple ones, TBH. The simplest one of contains a bit more than 1700 lines.

It's also pretty easy to emit, "I didn't find what I'm looking for under $ELEMENT" while parsing the file, or "I expected a string but I got $SOMETHING at element $ELEMENT".

Maybe I'm distorted because I worked with XML files more than decade, but I never spent more than 30 seconds while debugging an XML parsing process.

Also, this was one of the first parts I "sealed" in the said codebase and never touched it again, because it worked, even if the coming file is badly formed (by erroring out correctly and cleanly).

StopDisinfo910 2 days ago | parent [-]

> It's also pretty easy to emit, "I didn't find what I'm looking for under $ELEMENT" while parsing the file, or "I expected a string but I got $SOMETHING at element $ELEMENT".

I think we are actually in agreement. You could do exactly the same with a binary format without having to deal with the cumbersomeness of xml which is my point.

You are already treating xml like one writing errors in your own parsers and "sealing" it.

What’s the added value of xml then?

bayindirh 2 days ago | parent [-]

> cumbersomeness of xml...

Telling the parser to navigate to first element named $ELEMENT, checking a couple of conditions and assigning values in a defensive manner is not cumbersome in my opinion.

I would not call parsing binary formats cumbersome (I'm a demoscene fan, so I aspire to match their elegance and performance in my codebases), but not the pragmatic approach for that particular problem at hand.

So, we arrive to your next question:

> What’s the added value of xml then?

It's various. Let me try to explain.

First of all, it's a self documenting text format. I don't need an extensive documentation for it. I have a spec, but someone opening it in a text editor can see what it is, and understand how it works. When half (or most) of the users of your code are non-CS researchers, that's a huge plus.

Talking about non-CS researchers, these folks will be the ones generating these files from different inputs. Writing an XML in any programming language incl. FORTRAN and MATLAB (not kidding) is 1000 times easier and trivial than writing a binary blob.

Expanding that file format I have developed on XML is extremely easy. You change a version number, and maybe add a couple of paths to your parser, and you're done. If you feel fancy, allow for backwards compatibility, or just throw an error if you don't like the version (this is for non-CS folks mostly. I'm not that cheap). I don't need to work with nasty offsets or slight behavior differences causing to pull my hairs out.

The preservation is much easier. Scientific software rots much quicker than conventional software, so keeping file format readable is better for preservation.

"Sealing" in that project's parlance means "verify and don't touch it again". When you're comparing your results with a ground truth with 32 significant digits, you don't poke here and there leisurely. If it works, you add a disclaimer that the file is "verified at YYYYMMDD", and is closed for modifications, unless necessary. Same principle is also valid for performance reasons.

So, building a complex file format over XML makes sense. It makes the format accessible, cross-platform, easier to preserve and more.

scotty79 a day ago | parent | prev [-]

With this you have efficient binary format and generality of XML

https://en.m.wikipedia.org/wiki/Efficient_XML_Interchange

But somehow google forgot to implement this.