Remix.run Logo
tannhaeuser 8 hours ago

The XML spec starts like this:

> The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.

Where "generic SGML" refers to markup beyond the basic HTML vocabulary hardcoded into browsers, such as SVG and MathML. XML was specifically designed such that mere parsing doesn't require element-specific rules such as SGML-derived HTML tag omission/inference, empty elements, and attribute shortforms, by excluding these features from the XML subset of SGML. Original SGML always required a DTD schema to inform the parser about these things that HTML has to this day, and not just for legacy reasons either ie. new elements and attributes making use of these features are introduced all the time (cf. [1]).

Now XML Schema (W3C's XML schema language, and by far the most used one) isn't very beautiful, but is carefully crafted to be upwards compatible with DTDs in that it uses the same notion of automaton construction to decide admissability of content models (XSD's Unique Particle Attribution rule), rooted in SGML's zero lookahead design rationale that is also required for tag inference. Relax NG does away with this constraint, allowing a larger class of markup content models but only working with fully tagged XML markup.

XML became very popular for a while and, like JSON afterwards, was misused for all kind of things: service payloads in machine-to-machine communication, configuration files, etc., but these non-use cases shouldn't be held against its design. As a markup language, while XML makes a reasonable delivery or archival language, it's a failure as an authoring language due to its rigidity/redundancy and verbosity, as is evident by the massive use of markdown and other HTML short syntaxes supported by SGML but not XML.

[1]: https://sgmljs.sgml.net/docs/html5.html