Remix.run Logo
rafram 7 months ago

Sanitizing your strings immediately before display is all well and good until you need to pass them to some piece of third-party software that is very dumb and doesn’t sanitize them. You’ll argue that it’s the vendor’s fault, but the vendor will argue that nobody else allows characters like that in their name inputs!

See the Companies House XSS injection situation, where their rationale for forcing a business to change its name was that others using their database could be vulnerable: https://www.theregister.com/2020/10/30/companies_house_xss_s...

arkh 7 months ago | parent | next [-]

You sanitize at the frontier of what your code controls.

Sending data to a database: parametrized queries to sanitize as it is leaving your control.

Sending to display to the user: sanitized for a browser

Sending to an API: sanitize for whatever rules the API has

Sending to a legacy system: sanitize for it

Writing a file to the system: sanitize the path

The common point is you don't sanitize before you have to send it somewhere. And the advantage of this method is that you limit the chances of getting bit by reflected injections. You interrogate some API you don't control, you may just get malicious content, but you sanitize when sending it so all is good. Because you're sanitizing on output and not on input.

account42 7 months ago | parent | next [-]

What if the legacy API doesn't support escaping? Just drop characters? Implement your own ad-hoc transform? What if you need to interoperate with other API users.

Limting the character set at name input gives the user the chance to use the same ASCII-encoding of their name in all places.

shaky-carrousel 7 months ago | parent | prev [-]

Be liberal in what you accept, and conservative in what you send.

int_19h 7 months ago | parent [-]

Please don't. This is how standards die.

https://datatracker.ietf.org/doc/html/rfc9413

afiori 7 months ago | parent | prev | next [-]

Forbidding users to use your service to propagate "litte bobby tables" pseudo-pranks is likely a good choice.

The choice is different if like most apps you are almost only a data sink, but if you are also a data source for others it pays to be cautious.

dcow 7 months ago | parent [-]

I think it’s more of an ethical question than anything. There will always be pranksters and there will never be perfect input validation for names. So who do you oppress? The people with uncommon names? Or the pranksters? I happen to think that if you do your job right, the pranksters aren’t really a problem. So why oppress those with less common names?

afiori 7 months ago | parent | next [-]

I am not saying to only allow [a-zA-Z ]+ in names, what I am Saying is that it is ok to block names like "'; drop table users;" or "<script src="https://bad.site.net/></script>" if part of your business is to distribute that data to other consumers.

dcow 7 months ago | parent [-]

And I’m arguing, rhetorically, what if your name produces a syntax error—or worse means something semantically devious—in the query language I’m using? Not all problems look like script tags and semicolons.

foldr 7 months ago | parent | next [-]

It's a question of intent. There aren't any hard and fast rules, but if someone has chosen their company name specifically in order to cause problems for other people using your service, then it's reasonable to make them change it.

account42 7 months ago | parent | prev [-]

This is getting really absurd. Are you also going to complain that Unicode is too restrictive or are you going to demand being able to use arbitrary bytes as names. Images? If Unicode is enough, then which version.

There is always a somewhat arbitrary restriction. It's not unreasonable to also take other people into account besides the user wanting to enter his special snowflake name.

account42 7 months ago | parent | prev [-]

No one is being oppressed. Having to use an ASCII version of your name is literally a non-issue unless you WANT to be offended.

Maybe also think of the other humans that will need to read and retype the name. Do you expect everyone to understand and be able to type all characters? That's not reasonable. The best person to normalize the name to something interoperable is the user himself, so make him do it at data entry.

mabster 7 months ago | parent [-]

I was saying the exact same thing about how I don't understand why people get offended when they have to transcribe their name to use Hanzi!

We should have a world vote to settle which alphabet we use.

rob74 7 months ago | parent | prev [-]

> but the vendor will argue that nobody else allows characters like that in their name inputs

...and maybe they will even link to this page to support that statement! But, seeing that most of the pages are German, I bet they do accept the usual German "special" letters (ÄÖÜß) in names?

account42 7 months ago | parent [-]

So? Have you considered that the names may need to be eventually processed by people who understand the German alphabet but not all French accents (and certinly won't be able to type Hanzi or arabic or whatever else you expect everyone to support)? Will every system they interact with be able to deal with arbitrary symbols. Does the font of their letterhead support every script?

It's reasonable to expect a German company to deal with German script, less reasonable to expect them to deal with literally every script that someone once thought would be funny to include in Unicode.