Remix.run Logo
ongy 2 days ago

Calling XML human readable is a stretch. It can be with some tooling, but json is easier to read with both tooling and without. There's some level of the schema being relevant to how human readable the serialization is, but I know significantly fewer people that can parse an XML file by sight than json.

Efficient is also... questionable. It requires the full turing machine power to even validate iirc. (surely does to fully parse). by which metric is XML efficient?

bayindirh 2 days ago | parent | next [-]

By efficiency, I mean it's text and compresses well. If we mean speed, there are extremely fast XML parsers around see this page [0] for state of the art.

For hands-on experience, I used rapidxml for parsing said 3D object files. A 116K XML file is parsed instantly (the rapidxml library's aim is to have speed parity with strlen() on the same file, and they deliver).

Converting the same XML to my own memory model took less than 1ms including creation of classes and interlinking them.

This was on 2010s era hardware (a 3rd generation i7 3770K to be precise).

Verifying the same file against an XSLT would add some milliseconds, not more. Considering the core of the problem might took hours on end torturing memory and CPU, a single 20ms overhead is basically free.

I believe JSON and XML's readability is directly correlated with how the file is designed and written (incl. terminology and how it's formatted), but to be frank, I have seen both good and bad examples on both.

If you can mentally parse HTML, you can mentally parse XML. I tend to learn to parse any markup and programming language mentally so I can simulate them in my mind, but I might be an outlier.

If you're designing a file format based on either for computers only, approaching Perl level regular expressions is not hard.

Oops, forgot the link:

[0]: https://pugixml.org/benchmark.html

StopDisinfo910 2 days ago | parent | prev | next [-]

> Calling XML human readable is a stretch.

That’s always been the main flaw of XML.

There are very few use case where you wouldn’t be better served by an equivalent more efficient binary format.

You will need a tool to debug xml anyway as soon as it gets a bit complex.

bayindirh 2 days ago | parent | next [-]

A simple text editor of today (Vim, KATE) can real-time sanity check an XML file. Why debug?

StopDisinfo910 2 days ago | parent [-]

Because issue with XML are pretty much never sanity check. After all XML is pretty much never written by hand but by tools which will most likely produce valid xml.

Most of the time you will actually be debugging what’s inside the file to understand why it caused an issue and find if that comes from the writing or receiving side.

It’s pretty much like with a binary format honestly. XML basically has all the downside of one with none of the upside.

bayindirh 2 days ago | parent [-]

I mean, I found it pretty trivial to write parsers for my XML files, which are not simple ones, TBH. The simplest one of contains a bit more than 1700 lines.

It's also pretty easy to emit, "I didn't find what I'm looking for under $ELEMENT" while parsing the file, or "I expected a string but I got $SOMETHING at element $ELEMENT".

Maybe I'm distorted because I worked with XML files more than decade, but I never spent more than 30 seconds while debugging an XML parsing process.

Also, this was one of the first parts I "sealed" in the said codebase and never touched it again, because it worked, even if the coming file is badly formed (by erroring out correctly and cleanly).

StopDisinfo910 2 days ago | parent [-]

> It's also pretty easy to emit, "I didn't find what I'm looking for under $ELEMENT" while parsing the file, or "I expected a string but I got $SOMETHING at element $ELEMENT".

I think we are actually in agreement. You could do exactly the same with a binary format without having to deal with the cumbersomeness of xml which is my point.

You are already treating xml like one writing errors in your own parsers and "sealing" it.

What’s the added value of xml then?

bayindirh 2 days ago | parent [-]

> cumbersomeness of xml...

Telling the parser to navigate to first element named $ELEMENT, checking a couple of conditions and assigning values in a defensive manner is not cumbersome in my opinion.

I would not call parsing binary formats cumbersome (I'm a demoscene fan, so I aspire to match their elegance and performance in my codebases), but not the pragmatic approach for that particular problem at hand.

So, we arrive to your next question:

> What’s the added value of xml then?

It's various. Let me try to explain.

First of all, it's a self documenting text format. I don't need an extensive documentation for it. I have a spec, but someone opening it in a text editor can see what it is, and understand how it works. When half (or most) of the users of your code are non-CS researchers, that's a huge plus.

Talking about non-CS researchers, these folks will be the ones generating these files from different inputs. Writing an XML in any programming language incl. FORTRAN and MATLAB (not kidding) is 1000 times easier and trivial than writing a binary blob.

Expanding that file format I have developed on XML is extremely easy. You change a version number, and maybe add a couple of paths to your parser, and you're done. If you feel fancy, allow for backwards compatibility, or just throw an error if you don't like the version (this is for non-CS folks mostly. I'm not that cheap). I don't need to work with nasty offsets or slight behavior differences causing to pull my hairs out.

The preservation is much easier. Scientific software rots much quicker than conventional software, so keeping file format readable is better for preservation.

"Sealing" in that project's parlance means "verify and don't touch it again". When you're comparing your results with a ground truth with 32 significant digits, you don't poke here and there leisurely. If it works, you add a disclaimer that the file is "verified at YYYYMMDD", and is closed for modifications, unless necessary. Same principle is also valid for performance reasons.

So, building a complex file format over XML makes sense. It makes the format accessible, cross-platform, easier to preserve and more.

scotty79 a day ago | parent | prev [-]

With this you have efficient binary format and generality of XML

https://en.m.wikipedia.org/wiki/Efficient_XML_Interchange

But somehow google forgot to implement this.

int_19h 2 days ago | parent | prev [-]

It's kinda funny to see "not human readable" as an argument in favor of JSON over XML, when the former doesn't even have comments.

queenkjuul a day ago | parent [-]

And yet, it's still easier for me to parse with my eyes