| |
| ▲ | foota 3 days ago | parent | next [-] | | Yeah, OPs code is asking for pain. I suspect there are now developers who've never had to generate html outside the confines of a framework and so are completely unaware of the kinds of attacks you need to protect yourself against. You can do it from scratch, but you essentially need to track provenance of strings (either needs to be escaped and isn't html, e.g., user input, or html, which is either generated and with escaping already done or static code). It seems like you could build this reasonably simply by using tagged template literals and having e.g., two different Types of strings that are used to track provenance. | | | |
| ▲ | lylejantzi3rd 3 days ago | parent | prev [-] | | Posts are sanitized on the server side. This is client side code. | | |
| ▲ | chrismorgan 3 days ago | parent | next [-] | | Although appealing, that’s an extremely bad idea, when you’re limited to JavaScript. In a language with a better type system, it can be only a very bad idea. The problem is that different contexts have different escaping rules. It’s not possible to give a one-size-fits-all answer from the server side. It has to be done in a context-aware way. Field A is plain text. Someone enters the value “Alpha & Beta”. Now, what does your server do? If it sanitises by stripping HTML characters, you’ve just blocked valid input; not good. If it doesn’t sanitise but instead unconditionally escapes HTML, somewhere, sooner or later, you’re going to end up with an “Alpha & Beta” shown to the user, when the value gets used in a place that isn’t taking serialised HTML. It always happens sooner or later. (If it doesn’t sanitise or escape, and the client doesn’t escape but just drops it directly into the serialised HTML, that’s an injection vulnerability.) Field B is HTML. Someone enters the value “<img src=/ onerror=alert('pwnd')>”. Now, what does your server do? If it sanitises by applying a tag/attribute whitelist so that you end up with perhaps “<img src="/">”, fine. | | |
| ▲ | krapp 3 days ago | parent | next [-] | | Server-side templating frameworks had context-aware escaping strategies for years before front end frameworks were even a thing. Injection attacks don't persist because this is a hard problem, they persist because security is not a priority over getting a minimum viable product to market for most webdev projects. The old tried and true strategy of "never sanitize data, push to the database with prepared statements and escape in the templates" is basically bulletproof. | |
| ▲ | naasking 3 days ago | parent | prev | next [-] | | You're unnecessarily complicating this. The server is aware of what fields are HTML so it just encodes the data that it returns like we've been doing for 30 years now. If your point is that this approach is only good with servers that you trust, then that's useful to point out, although we kind of already are vulnerable to server data. | | |
| ▲ | chrismorgan 2 days ago | parent [-] | | You’re not getting it: we’re not talking about the server producing templated HTML, which is fine; but rather the server producing JSON, and then the client dropping strings from that object directly into serialised HTML. That’s a problem, because the only way to be safe is to entity-encode everything, but then when you use a string in a context that doesn’t use HTML syntax, you’ll get the wrong result. It’s not an unnecessary complication. You fundamentally need to know what format you’re embedding something into, in order to encode it, and the server can’t know that. Depending on what you do, you may want it unencoded, encoded for HTML data or double-quoted attribute value state (& → &, < → <, " → "), encoded for a URL query string parameter value (percent-encoding but with & → %26 as well), and there are several more reasonable possibilities even in the browser frontend context. These encodings are incompatible, therefore it’s impossible for the server to just choose one and have it work everywhere. | | |
| ▲ | naasking 14 minutes ago | parent [-] | | > It’s not an unnecessary complication. You fundamentally need to know what format you’re embedding something into, in order to encode it, and the server can’t know that. There are two cases here: 1. Backend endpoints are specifically tied to the view being generated (returns viewmodels), in which case the server knows what the client is rendering and can encode it. This frankly should be the default approach because it minimizes network traffic and roundtrips. The original code displayed is perfectly fine in this case. 2. Endpoints are generic and the client assembles views by making multiple requests to various endpoints and takes on the responsibility that server-side frameworks used to do, including encoding. |
|
| |
| ▲ | 3 days ago | parent | prev [-] | | [deleted] |
| |
| ▲ | hombre_fatal 3 days ago | parent | prev [-] | | Server-side sanitization means that your view code is inherently vulnerable to injection. You'll notice in modern systems you don't sanitize data in the database and you don't have to manually sanitize when rendering frontend code. It's like that for a reason. Server-side sanitization and xss injection should be left in the 2000s php era. | | |
| ▲ | jdsleppy 3 days ago | parent [-] | | Where do you suggest we sanitize values? Only in the client, when rendering them? | | |
| ▲ | chrismorgan 3 days ago | parent [-] | | Depends on what you mean by sanitising. If you mean filtering out undesirable parts of a document (e.g. disallowing <script> element or onclick attribute), that should normally be done on the server, before storage. If instead you mean serialising, writing a value into a serialised document: then this should be done at the point you’re creating the serialised document. (That is, where you’re emitting the HTML.) But the golden standard is not to generate serialised HTML manually, but to generate a DOM tree, and serialise that (though sadly it’s still a tad fraught because HTML syntax is such a mess; it works better in XML syntax). This final point may be easier to describe by comparison to JSON: do you emit a JSON response by writing `{`, then writing `"some_key":`, then writing `[`, then writing `"\"hello\""` after carefully escaping the quotation marks, and so on? You can, but in practice it’s very rarely done. Rather, you create a JSON document, and then serialise it, e.g. with JSON.stringify inside a browser. In like manner, if you construct a proper DOM tree, you don’t need to worry about things like escaping. | | |
| ▲ | juliend2 3 days ago | parent [-] | | What's wrong about filtering before saving, is that if you forget about one rule, you have to go back and re-filter already-saved data in the db (with some one-off script). I think "normally" we should instead filter for XSS injections when we generate the DOM tree, or just before (such as passing backend data to the frontend, if that makes more sense). | | |
| ▲ | zdragnar 3 days ago | parent [-] | | Don't forget that different clients or view formats (apps, export to CSV, etc) all have their own sanitization requirements. Sanitize at your boundaries. Data going to SQL? Apply SQL specific sanitization. Data going to Mongo? Same. HTML, JSON, markdown, CSV? Apply the view specific sanitizing on the way. The key difference is that, if you deploy a JSON API that is view agnostic, that the client now needs to apply the sanitization. That's a requirement of an agnostic API. | | |
| ▲ | chrismorgan 2 days ago | parent [-] | | Please don’t use the word sanitising for what you seem to be describing: it’s a term more commonly used to mean filtering out undesirable parts. Encoding for a particular serialised format is a completely different, and lossless, thing. You can call it escaping or encoding. | | |
| ▲ | zdragnar 2 days ago | parent [-] | | Sanitizing is just a form of encoding that prevents data from becoming executable unintentionally. | | |
| ▲ | chrismorgan 2 days ago | parent [-] | | I don’t like how you’re categorising things. Sanitising is absolutely nothing to do with encoding. You can sanitise without encoding, you can encode without sanitising, or you can do both in sequence; and all of these combinations are reasonable and common, in different situations. And sanitising may operate on serialised HTML (risky), or on an HTML tree (both easier and safer). Saying sanitising is a form of encoding is even less accurate than saying that a paint-mixing stick is a type of paint brush. You can mix paint without painting it, and you can paint without mixing it first. |
|
|
|
|
|
|
|
|
|