Remix.run Logo
zzo38computer 2 days ago

For passwords, you might not need to care about the character encoding, since they are not going to be displayed anyways. You should allow any password, and the maximum length should not be too short.

For usernames, I think your point is valid; you might restrict usernames to a subset of ASCII (not arbitrary ASCII; e.g. you might disallow spaces and some punctuations), or use numeric user IDs, while the display name might be less restricted. (In some cases (probably uncommon) you might also use a different character set than ASCII if that is desirable for your application, but Unicode is not a good way to do it.)

(I also think that Unicode is not good; it is helpful for many applications to have i18n (although you should be aware what parts should use it and what shouldn't), but Unicode is not a good way to do it.)

numpad0 2 days ago | parent [-]

> For passwords, you might not need to care about the character encoding, since they are not going to be displayed anyways.

That would be reasonable if there were strict 1:1 correspondence between intended text and binary representations, but there isn't. Unicode has equivalents of British and American spellings, and users has no control over which to use. Precomposed vs Combining characters, Variant Selectors, etc. Ensuring it all regularize into canonical password string as developer obligation is unreasonable, and just falling back to ASCII is much more reasonable.

I guess everyone using alphanumeric sequences for every identifiers is somewhat imperialistic in a sense, but it's close to the least controversial of general cultural imperialism problems. It's probably okay to leave it to be solved for a century or two.