Remix.run Logo
zetafunction 6 hours ago

https://issues.chromium.org/issues/451401343 tracks work needed in the upstream xml-rs repository, so it seems like the team is working on addressing issues that would affect standards compliance.

Disclaimer: I work on Chrome and have occasionally dabbled in libxml2/libxslt in the past, but I'm not directly involved in any of the current work.

inejge 6 hours ago | parent | next [-]

I hope they will also work on speeding it up a bit. I needed to go through 25-30 MB SAML metadata dumps, and an xml-rs pull parser took 3x more time than the equivalent in Python (using libxml2 internally, I think.) I rewrote it all with quick-xml and got a 7-8x speedup over Python, i.e., at least 20x over xml-rs.

nwellnhof 3 hours ago | parent [-]

Python ElementTree uses Expat, only lxml uses libxml2. Right now, I'm working on SIMD acceleration in my not yet released, GPL-licensed fork of libxml2. If you have lots of character data or large attribute values like in SVG, you will see tremendous speed improvements (gigabytes per second). Unfortunately, this is unlikely to make it into web browsers.

Ygg2 6 hours ago | parent | prev [-]

Wait. They are going along with a XML parser that supports DOCTYPES? I get XSLT is ancient and full of exploits, but so is DOCTYPE. Literally poster boy for billion laughs attack (among other vectors).

mananaysiempre 6 hours ago | parent | next [-]

You don't need DOCTYPE for that, you can put an ENTITY declaration straight in your source file ("internal subset") and the XML spec it needs to be processed. (I seem to recall someone saying that Adobe tools are fond of putting those in their exported SVG files.)

Mikhail_Edoshin 5 hours ago | parent | prev | next [-]

The billion laughs bug was fixed in libxml2 in 2008. (As far as I understand in .Net this bug was fixed in 2014 with .Net 4.5.2. In 2019 a bug similar to "billion laughs" was found in Go YAML parser although it was explicitly mentioned and forbidden by YAML specs. Among other products it affected Kubernetes.)

Other vectors probably mean a single vector: external entities, where a) you process untrusted XML on server and b) allow the processor to read external entities. This is not a bug, but early versions of XML processors may lack an option to disallow access to external entities. This also has been fixed.

XSLT has no exploits at all, that is no features that can be misused.

fabrice_d 6 hours ago | parent | prev [-]

The billion laughs attack has well known solutions (basically, don't recurse too deep). It's not a reason to not implement DOCTYPE support.