| ▲ | hrmtst93837 3 hours ago | ||||||||||||||||
If you want a parser that actually checks the XML spec and various edge cases, then parsing goes from human-readable config to O(n^2) string handling. The funny part is how often people silently accept partial or broken XML in prod because revisiting schema validation years later is a nightmare. If you want cheap parsing, you end up writing a regex or DOM walker and hoping for the best, which raises the question of why not just use JSON or invent a different DSL to start. | |||||||||||||||||
| ▲ | sparkie an hour ago | parent [-] | ||||||||||||||||
(Properly formatted) XML can be parsed, and streamed, by a visibly-pushdown automaton[1][2]. "Visibly Pushdown Expressions"[3] can simplify parsing with a terse syntax styled after regular expressions, and there's an extension to SQL which can query XML documents using VPAs[4]. JSON can also be parsed and validated with visibly pushdown automata. There's an interesting project[5] which aims to automatically produce a VPA from a JSON-schema to validate documents. In theory these should be able outperform parsers based on deterministic pushdown automata (ie, (LA)LR parsers), but they're less widely used and understood, as they're much newer than the conventional parsing techniques and absent from the popular literature (Dragon Book, EAC etc). [1]:https://madhu.cs.illinois.edu/www07.pdf [2]:https://www.cis.upenn.edu/~alur/Cav14.pdf [4]:https://web.cs.ucla.edu/~zaniolo/papers/002_R13.pdf [3]:https://homes.cs.aau.dk/~srba/courses/MCS-07/vpe.pdf [5]:https://www.gaetanstaquet.com/ValidatingJSONDocumentsWithLea... | |||||||||||||||||
| |||||||||||||||||