Remix.run Logo
jdsleppy 3 days ago

Where do you suggest we sanitize values? Only in the client, when rendering them?

chrismorgan 3 days ago | parent [-]

Depends on what you mean by sanitising.

If you mean filtering out undesirable parts of a document (e.g. disallowing <script> element or onclick attribute), that should normally be done on the server, before storage.

If instead you mean serialising, writing a value into a serialised document: then this should be done at the point you’re creating the serialised document. (That is, where you’re emitting the HTML.)

But the golden standard is not to generate serialised HTML manually, but to generate a DOM tree, and serialise that (though sadly it’s still a tad fraught because HTML syntax is such a mess; it works better in XML syntax).

This final point may be easier to describe by comparison to JSON: do you emit a JSON response by writing `{`, then writing `"some_key":`, then writing `[`, then writing `"\"hello\""` after carefully escaping the quotation marks, and so on? You can, but in practice it’s very rarely done. Rather, you create a JSON document, and then serialise it, e.g. with JSON.stringify inside a browser. In like manner, if you construct a proper DOM tree, you don’t need to worry about things like escaping.

juliend2 3 days ago | parent [-]

What's wrong about filtering before saving, is that if you forget about one rule, you have to go back and re-filter already-saved data in the db (with some one-off script).

I think "normally" we should instead filter for XSS injections when we generate the DOM tree, or just before (such as passing backend data to the frontend, if that makes more sense).

zdragnar 3 days ago | parent [-]

Don't forget that different clients or view formats (apps, export to CSV, etc) all have their own sanitization requirements.

Sanitize at your boundaries. Data going to SQL? Apply SQL specific sanitization. Data going to Mongo? Same. HTML, JSON, markdown, CSV? Apply the view specific sanitizing on the way.

The key difference is that, if you deploy a JSON API that is view agnostic, that the client now needs to apply the sanitization. That's a requirement of an agnostic API.

chrismorgan 2 days ago | parent [-]

Please don’t use the word sanitising for what you seem to be describing: it’s a term more commonly used to mean filtering out undesirable parts. Encoding for a particular serialised format is a completely different, and lossless, thing. You can call it escaping or encoding.

zdragnar 2 days ago | parent [-]

Sanitizing is just a form of encoding that prevents data from becoming executable unintentionally.

chrismorgan 2 days ago | parent [-]

I don’t like how you’re categorising things. Sanitising is absolutely nothing to do with encoding. You can sanitise without encoding, you can encode without sanitising, or you can do both in sequence; and all of these combinations are reasonable and common, in different situations. And sanitising may operate on serialised HTML (risky), or on an HTML tree (both easier and safer).

Saying sanitising is a form of encoding is even less accurate than saying that a paint-mixing stick is a type of paint brush. You can mix paint without painting it, and you can paint without mixing it first.