Remix.run Logo
bawolff a day ago

Entities are also probably a significant reason for XML's bad reputation. Between using them to leak senditive files (XXE), and recursive expansion being a DoS vector (million laughs), it cemented XML's reputation as a security risk.

In the browser context though, i don't see why they couldn't just follow same origin rules.

tannhaeuser a day ago | parent [-]

I was surprised this wasn't brought up earlier, as it always is in these discussions. It's trivial to mitigate security risk arising out the use of entities. SGML even has the CAPACITY ENTLVL built-in entity nesting limit.

TBH, a browser has one million other potential leaks and exploits and denials to care about. Unlike a markup language with carefully crafted rules about what to expand in which context, a browser providing a Turing-complete language trivially runs into infinite loops or recursion and could at best heuristically terminate JS code, after using laborious best-efforts methods such as execution traces and watchdog timers. Moreover, JS syntax itself and how it's represented in markup unneccessarily gives rise to obfuscation and escaping attacks. And the browser in addition also plays fast and loose with an ever-expanding, potentially Turing-complete styling language with super complicated styling syntax and constraint-based layout models with a complete lack of formal semantics or any other formal reasoning. CSS itself can contain HTML literals at several places such as data URIs and the content property with questionable escaping, but I guess this is par for the course when CSS is basically just tunneling yet another syntax through HTML attributes and elements. Context Security Policy is just bolted on to limit this pointless and careless gratuitious syntax proliferation, and of course must invent its own little item-value microsyntax to be tunneled through HTTP headers and HTML header tags.

Considering these glaring and obvious deficits weren't discussed at the same time I assume those who brought up XXE and suggested SGML/XML could leak documents or smuggle executable instructions did so in bad faith or were repeating hearsay to sound clever or something.

bawolff a day ago | parent [-]

This is mostly not an issue in browsers so much as non browser implementations. Its not even really an issue in modern implementations with better defaults.

What it did do is create a perception that XML is insecure. This hastened its demise.

> It's trivial to mitigate security risk arising out the use of entities.

Obviously. However in practise, historically most implementations did not. At least not by default.

XML spec bears some responsibility for this for not being explicit about suggesting secure defaults.

Regardless, JSON won partially because it didnt have the attack surface, so people didn't have to worry. XML being theoretically easy to secure means nothing when practically implementations made it difficult.