Remix.run Logo
plorkyeran 5 days ago

There was a lot of pain and suffering along the way to html5, and html5 is the logical end state of postel's law: every possible sequence of bytes is a valid html5 document with a well-defined parsing, so there is no longer any room to be more liberal in what you accept than what the standard permits (at least so far as parsing the document).

emn13 5 days ago | parent [-]

Getting slightly off topic, but I think it's hard to find the right terminology to talk about html's complexities. As you point out, it isn't really a syntax anymore now that literally every sequence is valid. Yet the parsing rules are obviously not as simple as a .* regex. It's syntactically simple, but structurally complex? What's the right term for the complexity represented by how the stack of open elements interacts with self-closing or otherwise special elements?

Anyhow, I can't say I'm thrilled that some deeply nested subtree of divs for instance might be closed by a open-button tag just because they were themselves part of a button, except when... well, lots of exceptions. It's what we have, I guess.

It's also not a (fully) solved problem; just earlier this year I had to work around an issue in the chromium html parser that caused IIRC quadratic parsing behavior in select items with many options. That's probably the most widely used parser in the world, and a really inanely simple repro. I wonder whether stuff like that would slip through as often were the parsing rules at all sane. And of course encapsulation of a document-fragment is tricky due to the context-sensitivity of the parsing rules; many valid DOM trees don't have an HTML serialization.