What's wrong about filtering before saving, is that if you forget about one rule, you have to go back and re-filter already-saved data in the db (with some one-off script).

I think "normally" we should instead filter for XSS injections when we generate the DOM tree, or just before (such as passing backend data to the frontend, if that makes more sense).

▲

zdragnar 2 months ago | parent [-]

Don't forget that different clients or view formats (apps, export to CSV, etc) all have their own sanitization requirements.

Sanitize at your boundaries. Data going to SQL? Apply SQL specific sanitization. Data going to Mongo? Same. HTML, JSON, markdown, CSV? Apply the view specific sanitizing on the way.

The key difference is that, if you deploy a JSON API that is view agnostic, that the client now needs to apply the sanitization. That's a requirement of an agnostic API.

▲

chrismorgan 2 months ago | parent [-]

Please don’t use the word sanitising for what you seem to be describing: it’s a term more commonly used to mean filtering out undesirable parts. Encoding for a particular serialised format is a completely different, and lossless, thing. You can call it escaping or encoding.

▲

zdragnar 2 months ago | parent [-]

Sanitizing is just a form of encoding that prevents data from becoming executable unintentionally.

	▲	chrismorgan 2 months ago \| parent [-]
		I don’t like how you’re categorising things. Sanitising is absolutely nothing to do with encoding. You can sanitise without encoding, you can encode without sanitising, or you can do both in sequence; and all of these combinations are reasonable and common, in different situations. And sanitising may operate on serialised HTML (risky), or on an HTML tree (both easier and safer). Saying sanitising is a form of encoding is even less accurate than saying that a paint-mixing stick is a type of paint brush. You can mix paint without painting it, and you can paint without mixing it first.