| ▲ | AndyKelley 5 days ago |
| If you think you need libxml2, think again. XML is a complex beast. Do you really need all those features? Maybe a much smaller, more easily maintained library would suit your needs while performing better at the same time! For instance, consuming XML and creating it are two very different use cases. Zooming into consuming it, perhaps your input data has more guarantees than libxml2 assumes, such as the nonexistence of meta definition tags. |
|
| ▲ | throw0101a 5 days ago | parent | next [-] |
| > Do you really need all those features? "You" probably do not. But different "yous" need different features, and so they get all glommed together into one big thing. So no one needs "all" of lbxml2/XML's features, each individual needs a different subset. |
| |
| ▲ | bartread 5 days ago | parent | next [-] | | It's the same as the old joke about Microsoft Word: people only use 10% of Word's functionality, but the problem is each person uses a different 10%. Of course this is an oversimplification, and there will no doubt be some sort of long tail, but it expresses the challenge well. I'd imagine the same is true for many other reasonably complex libraries, frameworks, or applications. | |
| ▲ | agwa 5 days ago | parent | prev | next [-] | | XML without DTDs is a very reasonable subset that eliminates significant complexity (no need for an HTTP client!) and security risks (no custom character entities that are infinitely recursive or read /etc/passwd!) and would probably still work for >80% of users. (I wrote such an XML parser a long time ago.) | | |
| ▲ | jlarocco 5 days ago | parent [-] | | Why throw out numbers when we all know you haven't actually measured that it's >80%? In any case, the tooling around XML (DTDs, XPath, XSLT, etc.) is the reason to use it. I would go so far as to say the (supposed) >80% not using those features shouldn't have used XML in the first place. | | |
| ▲ | tracker1 4 days ago | parent [-] | | I agree.. which is part of why I generally dislike using XML for most things. |
|
| |
| ▲ | x0x0 3 days ago | parent | prev [-] | | Not to mention that libxml2 underlies things like nokogiri (the commonly used html parsing gem for ruby), beautifulsoup (python's equivalent), etc. | | |
| ▲ | dragonwriter 3 days ago | parent [-] | | Pretty sure beautifulsoup uses python’s builtin html.parser but can optionally use html5lib or lxml if installed, and it is lxml, not beautifulsoup, that actually depends on libxml2. You’re right about nokogiri, though. | | |
| ▲ | x0x0 3 days ago | parent [-] | | Ah, you're right, in the codebase I'm familiar with lxml is used for performance, though it's not the default. |
|
|
|
|
| ▲ | mort96 5 days ago | parent | prev | next [-] |
| I kinda want something which just treats XML as a dumb tree definition language... give me elements with attributes as string key/value pairs, and children as an array of elements. And have a serialiser in there as well, it shouldn't hurt. Basically something behaves like your typical JSON parser and serialiser but for XML. To my knowledge, this is what TinyXML2 does, and I've used TinyXML2 for this before to great effect. |
| |
| ▲ | cHaOs667 5 days ago | parent [-] | | That's what you call a DOM Parser - the problem with them is, as they serialize all the elements into objects, bigger XML files tend to eat up all of your RAM. And this is where SAX2 parsers come into play where you define tree based callbacks to process the data. | | |
| ▲ | mort96 5 days ago | parent [-] | | The solution is simple: don't have XML files that are many gigabytes in size. | | |
| ▲ | iberator 5 days ago | parent | next [-] | | A lot of teleco stuff dumps multi-gb stuff of xml hourly. Per BTS. Processing few TB of XML files on one server daily It's doable, just use the right tools and hacks :) Processing schema-less or broken schema stuff is always hilarious. Good times. | | |
| ▲ | senorrib 5 days ago | parent [-] | | Lol I love the upbeat tone here. Helps me deal with my PTSD after working with XML files. |
| |
| ▲ | cHaOs667 5 days ago | parent | prev | next [-] | | Depending on the XML structure and the servers RAM - it can already happen while you approach 80-100 MB file sizes. And to be fair, in the Enterprise context, you are quite often not in a position to decide how big the export of another system is. But yes, back in 2010 we built preprocessing systems that checked XMLs and split them up in smaller chunks if they exceeded a certain size. | |
| ▲ | lyu07282 5 days ago | parent | prev | next [-] | | Tell that to wikimedia, I've used libxml's SAX parser in the past to parse 80GB+ xml dumps. | |
| ▲ | stuaxo 5 days ago | parent | prev [-] | | Some formats are this and they are historical formats. |
|
|
|
|
| ▲ | remus 5 days ago | parent | prev | next [-] |
| This process usually goes: 1. "This XML library is way bigger than what I need, I'll write something more minimal for my use case" 2. write a library for whatever minimal subset you need 3. crash report comes in, realise you missed off some feature x. Add support for some feature x. 4. Bob likes your library. So small, so elegant. He'd love to use it, if only you supported feature y, so you add support for feature y. ... End result is x+1 big, complex XML libraries. Obviously Im being a bit obtuse here because you might be able to guarantee some subset of it in whatever your specific circumstances are, but I think it's hard to do over a long period of time. If people think you're speaking XML then at some point they'll say "why don't we use this nice XML feature to add this new functionality". |
| |
| ▲ | bayindirh 5 days ago | parent | next [-] | | If you want to read some XML quickly, there's always RapidXML and PugiXML, but if you need a big gun, there's libXML. The former are blazingly fast. In real world they can parse instantly. So alternatives do exist for different use cases. | |
| ▲ | hulitu 4 days ago | parent | prev [-] | | > Obviously Im being a bit obtuse here No. This is the first good expkanation for the library hell in linux those days. |
|
|
| ▲ | jeroenhd 5 days ago | parent | prev | next [-] |
| XML is used in countless standards. You can't just not use it if you interact with the outside world. Every XML feature is still in the many XML libraries because someone has a need for it, even things like external entities. Maybe you don't need libxml2 specifically (good luck finding an alternative to parse XML in C and other such languages though), but "I don't like the complex side of XML so let's pretend it doesn't exist" doesn't solve the problem most people pick libxml2 for. It's the de-facto standard because it supports everything you could possibly need. |
| |
| ▲ | AndyKelley a day ago | parent | next [-] | | It's common for both the producer of XML and the consumer of XML for any given application to be using a dramatically smaller subset of the standard. Well-engineered software is intentional about this and documents those limitations. Under these conditions it's perfectly valid to use a library that only supports this subset. Furthermore, those subsets have natural "fault lines", influenced by the burden:utility ratio. This makes consumers and producers naturally coordinate on a subset. It's not like another commenter here said about everyone needing different features. My argument is therefore that there is value in having different libraries for different subsets - with the smallest subset being much simpler than libxml2. | |
| ▲ | dontlaugh 5 days ago | parent | prev | next [-] | | Exactly. For example if you need to integrate SAML, you have to support a significant subset of several XML specs. It may be possible to write a SAML-only library that supports less, but it's not clear it would be any simpler. | |
| ▲ | lyu07282 5 days ago | parent | prev [-] | | You shouldn't be down voted, its just the truth no matter how unfortunate. |
|
|
| ▲ | pferde 5 days ago | parent | prev | next [-] |
| There is always libexpat, which works very well, also for the streaming case. |
| |
|
| ▲ | EvanAnderson 5 days ago | parent | prev [-] |
| Gratuitous use of XML does sometimes smell like a "now you have two problems" kind of affair. |