Remix.run Logo
throw0101a 5 days ago

> Do you really need all those features?

"You" probably do not.

But different "yous" need different features, and so they get all glommed together into one big thing. So no one needs "all" of lbxml2/XML's features, each individual needs a different subset.

bartread 5 days ago | parent | next [-]

It's the same as the old joke about Microsoft Word: people only use 10% of Word's functionality, but the problem is each person uses a different 10%.

Of course this is an oversimplification, and there will no doubt be some sort of long tail, but it expresses the challenge well. I'd imagine the same is true for many other reasonably complex libraries, frameworks, or applications.

agwa 5 days ago | parent | prev | next [-]

XML without DTDs is a very reasonable subset that eliminates significant complexity (no need for an HTTP client!) and security risks (no custom character entities that are infinitely recursive or read /etc/passwd!) and would probably still work for >80% of users.

(I wrote such an XML parser a long time ago.)

jlarocco 5 days ago | parent [-]

Why throw out numbers when we all know you haven't actually measured that it's >80%?

In any case, the tooling around XML (DTDs, XPath, XSLT, etc.) is the reason to use it. I would go so far as to say the (supposed) >80% not using those features shouldn't have used XML in the first place.

tracker1 4 days ago | parent [-]

I agree.. which is part of why I generally dislike using XML for most things.

x0x0 3 days ago | parent | prev [-]

Not to mention that libxml2 underlies things like nokogiri (the commonly used html parsing gem for ruby), beautifulsoup (python's equivalent), etc.

dragonwriter 3 days ago | parent [-]

Pretty sure beautifulsoup uses python’s builtin html.parser but can optionally use html5lib or lxml if installed, and it is lxml, not beautifulsoup, that actually depends on libxml2.

You’re right about nokogiri, though.

x0x0 3 days ago | parent [-]

Ah, you're right, in the codebase I'm familiar with lxml is used for performance, though it's not the default.