Remix.run Logo
powersnail 18 hours ago

As someone who really think name field should just be one field with any printable unicode characters, I do wonder what the hell would I need to do if I take customer names in this form, and then my system has to interact with some other service that requires first/last name split, and/or [a-zA-Z] validation, like a bank or postal service.

Automatic transliteration seems to be very dangerous (wrong name on bank accounts, for instance), and not always feasible (some unicode characters have more than one way of being transliterated).

Should we apologize to the user, and just ask the user twice, once correctly, and once for the bad computer systems? This seems to be the only approach that both respects their spelling, and at the same time not creating potential conflict with other systems.

nicbou 16 hours ago | parent | next [-]

We had problems with a Ukrainian refugee we helped because certified translations of her documents did not match. Her name was transliterated the German way in one place and the English way in another.

Those are translations coming from professionals who swore an oath. Don’t try to do it with code.

hyeonwho4 15 hours ago | parent [-]

In the US, you can generally specify to your certified translators how you want proper names and place names written. I would suggest you or your friend talk to the translators again so that everything matches. It will also minimize future pains.

Also, USCIS usually has an "aliases" field on their forms, which would be a good place to put German government misspellings.

77pt77 14 hours ago | parent [-]

USCIS is a mess.

I know someone that still doesn't know whether they have a middle name as far as american authorities are concerned.

Coupled with "two last names" and it gets really messy, really quickly.

Purchases names don't match the CC name.

Bank statements are actually "for another person".

Border crossings are now extra spicy.

And "pray" that your name doesn't resemble a name in some blacklist.

matthewbauer 17 hours ago | parent | prev | next [-]

You can just show the user the transliteration & have them confirm it makes sense. Always store the original version since you can't reverse the process. But you can compare the transliterated version to make sure it matches.

Debit cards a pretty common example of this. I believe you can only have ASCII in the cardholder name field.

Muromec 17 hours ago | parent | next [-]

>But you can compare the transliterated version to make sure it matches

No you can't.

Add: Okay, you need to know why. I'm right here a living breathing person with a government id that has the same name scribed in two scripts side by side.

There is an algorithm (blessed by the same government that issued said it) which defines how to transliterate names from one to another, published on the parliament web site and implement in all the places that are involved in the id issuing business.

The algorithm will however not produce the outcome you will see on my id, because me, living breathing person who has a name asked nicely to spell it the way I like. The next time I visit the id issuing place, I could forget to ask nicely and then I will have two valid ids (no, the old one will not be marked as void!) with three names that don't exactly match. It's all perfectly fine, because name as a legal concept is defined in the character set you probably can't read anyway.

Please, don't try be smart with names.

lmm 14 hours ago | parent [-]

Your example fails to explain any problem with GPs proposal. They would show you a transliteration of your name and ask you to confirm it. You would confirm it or not. It might match one or other of your IDs (in which case you would presumably say yes) or not (in which case you would presumably say no). What's the issue?

Muromec 14 hours ago | parent [-]

You will compare the transliterated version I provided with the one you have already, it will not match and then what? Either you tell me I have invalid name or you just ignore it.

lmm 13 hours ago | parent [-]

I think they were suggesting the opposite order - do an automatic transliteration and offer you the choice to approve or correct it.

But even if the user is entering both, warning them that the transliteration doesn't match and letting them continue if they want is something that pays for itself in support costs.

8organicbits 15 hours ago | parent | prev [-]

I have an ID that transliterated my name, and included the original, but the original contained an obvious typo. I immediately notified the government official, but they refused to fix it. They assured me that only the transliterated name would be used.

Human systems aren't always interested in avoiding or fixing defects.

koito17 12 hours ago | parent | prev | next [-]

Ask for inflections separately.

For instance, in many Japanese forms, there are dedicated fields for the name and the pronunciation of the name. There are possibly multiple ways to read a name (e.g. 山崎 is either やまざき or やまさき). It is better to ask the person "how to read your name?" rather than execute code to guess the reading.

As for transliteration, it's best to avoid if possible. If not possible, then rely on international standards (e.g. Japanese has ISO 3602 and Arabic has ISO 233-2). When international standards don't exist, then fall back to "context-dependent" standards (e.g. in Taiwan, there are several variants of Pinyin. Allow the user to choose the romanization that matches their existing documentation).

junek 16 hours ago | parent | prev | next [-]

The fundamental mistake is in trying to take input for one purpose and transform it for another purpose. Just have the user fill in an additional field for their name as it appears on bank statements, or whatever the second purpose is. Trying to be clever about this stuff never works out.

layer8 15 hours ago | parent [-]

What you call the second purpose is often the only purpose. Or you have to talk to half a dozen other systems each of which have different constraints. You wouldn’t want to present the user half a dozen fields just so that they can choose the nicest representation of their name for each system.

That being said, in Japan it’s actually common to have two fields, one for the name in kanji (the “nice” name), and one in katakana (restricted, 8-bit phonetic alphabet, which earlier/older computer systems used and some probably still use).

Muromec 14 hours ago | parent [-]

You usually don't have a dozen, just two or three and if you do have a dozen, there is a certain pattern, or at least the common denominator for the half of them to be ASCII, a another half using some kind of local convention you totally know how to encode.

Muromec 18 hours ago | parent | prev | next [-]

Okay, I have a non-ASCII (non Latin even) name, so I can tell. You just ask explicitly how my name is spelled in a bank system or my government id. Please don't try transliteration, unless you know exact rules the other system suggests to transliterate my name from the one cultural context into another and then still make it a suggestion and make it clear for which purpose it will be used (and then only use it for that purpose).

And please please please, don't try to be smart and detect the cultural context from the character set before automatically translating it to another character set. It will go wrong and you will not notice for a long time, but people will make mean passive aggressive screenshots of your product too.

My bank for example knows my legal name in Cyrillic, but will not print it on a card, so they make best-effort attempt to transliterate it to ASCII, but make it editable field and will ask me to confirm this is how I want it to be on a card.

teaearlgraycold 16 hours ago | parent | prev [-]

Legal name vs. display name

deathanatos 12 hours ago | parent [-]

… "legal name" is "things programmer's believe about names" grade. Maybe (name, jurisdiction), but I've seen exceptions to that, too.

Where I live, no less than 3 jurisdictions have a say about my "legal" name, and their laws do not require them to match. At one point, one jurisdiction had two different "legal" names for me, one a typo by my standards, but AFAICT, both equally valid.

There's no solution here, AFIACT, it's just evidence towards why computers cannot be accountability sinks.