Remix.run Logo
Muromec 7 months ago

Quite the opposite actually. I want it stored correctly and in a way that both me and CSR can understand and so it can be used to interface with other systems.

I don’t however know which unicode subset to use, because you didn’t tell me in the signup form. I have many options, all of them correct, but I don’t know whether your CSR can read Ukrainian Cyrillic and whether you can tell what vocative case is and not use that when inerfacing with the government CA which expects nominative.

ACS_Solver 7 months ago | parent | next [-]

I think you're touching on another problem, which is that we as users rarely know why the form wants a name. Is it to be used in emails, or for sending packages, or for talking to me?

My language also has a separate vocative case, but I live in a country that has no concept of it and just vestiges of a case system. I enter my name in the nominative, which then of course looks weird if I get emails/letters from them later - they have no idea to use the vocative. If I knew the form is just for sending me emails, I'd maybe enter my name in the vocative.

Engineers, or UX designers, or whoever does this, like to pretend names are simple. They're just not (obligatory reference to the "falsehoods about names" article). There are many distinct cases for why you may want my name and they may all warrant different input.

- Name to use in letters or emails. It doesn't matter if a CSR can pronounce this if it's used in writing, it should be a name I like to see in correspondence. Maybe it's in a script unfamiliar to most CSRs, or maybe it's just a vocative form.

- Name for verbal communication. Just about anything could be appropriate depending on the circumstances. Maybe an anglicized name I think your company will be able to pronounce, maybe a name in a non-Latin script if I expect it to be understood here, maybe a name in a Latin-extended script if I know most people will still say it reasonably well intuitively. But it could also be an entirely different name from the written one if I expect the written one to be butchered.

- Name for package deliveries. If I'm ordering a package from abroad, I want my name (and address) written in my local convention - I don't care if the vendor can't read it, first the package will make its way to my country using the country and postal code identifiers, and then it should have info that makes sense to the local logistics companies, not to the seller's IT system.

- Legal name because we're entering a contract or because my ID will be checked later on for some reason.

- Machine-readable legal name for certain systems like airlines. For most of the world's population, this is not the same as the legal name but of course English-language bias means this is often overlooked.

pezezin 7 months ago | parent | next [-]

> - Name for package deliveries. If I'm ordering a package from abroad, I want my name (and address) written in my local convention - I don't care if the vendor can't read it, first the package will make its way to my country using the country and postal code identifiers, and then it should have info that makes sense to the local logistics companies, not to the seller's IT system.

I am not sure that what you ask is possible, there might be local or international regulations that force them to write all the addresses in a certain way.

But on the positive side, I have found that nowadays most online shops provide a free-from field for additional delivery instructions. I live in Japan, and whenever I order something from abroad I write my address in Japanese, and most sellers are nice enough to print it and put it on the side of the box, to make the life of the delivery guys easier.

account42 7 months ago | parent | prev [-]

The thing is "printable ASCII letters" is something usable for all of those cases. It may not be 100% perfect for the user's feelings but it just works.

ACS_Solver 7 months ago | parent [-]

This is patently wrong and it's the sort of thinking that still causes inconvenience to people using non-ASCII languages, years after it's technically justifiable.

The most typical problem scenario is getting some package or document with names transformed to ASCII and then being unable to actually receive the package or use the document because the name isn't your name. Especially when a third party is involved that doesn't speak the language that got mangled either.

Åke Källström is not the same name as Ake Kallstrom. Domestically the latter just looks stupid but then you get a hotel booking with that name, submit it as part of your visa application and the consulate says it's invalid because that's not your name.

Or when Rūta Lāse gets some foreign document or certificate, nobody in her country treats is authentic because the name written is Ruta Lase, which is also a valid and existing name - but a different one. She ends up having to request another document that establishes the original one is issued to her, and paying for an apostille on that so the original ASCII document is usable. While most languages have a standard way of changing arbitrary text to ASCII, the conversion function is often not bijective even for Latin-based alphabets.

These are real examples of real problems people still encounter because lots of English-speaking developers insist everyone should deal with an ASCII-fied version of their language. In the past I could certainly understand the technical difficulties, but we're some 20-25 years past the point where common software got good Unicode support. ASCII is no longer the only simple solution.

dgfitz 7 months ago | parent | prev [-]

In this specific case, it seems like your concerns are a hypothetical, no?

swiftcoder 7 months ago | parent [-]

Not really, no. A lot of us only really have to deal with English-adjacent input (i.e. European languages that share the majority of character forms with English, or cultures that explicitly Anglicise their names when dealing with English folks).

As soon as you have to deal with users with a radically different alphabet/input-method, the wheels tend to come off. Can your CSR reps pronounce names written in Chinese logographs? In Arabic script? In the Hebrew alphabet?

cowsandmilk 7 months ago | parent | next [-]

You can analyze the name and direct a case to a CSR who can handle it. May be unrealistic for a 1-2 person company, but every 20+ person company I’ve worked at has intentionally hired CSRs with different language abilities.

Muromec 7 months ago | parent | next [-]

First of, no you can't infer language preference from a name. The reasonable and well meaning assumption about my name on a good day makes me only sad and irritated.

And even if you could, I don't know if you actually do it by looking at what you signup form asks me to input.

michaelt 7 months ago | parent | prev [-]

A requirement to do that is an extremely broad definition of "treat strings as opaque blobs most of the time" IMHO :)

int_19h 7 months ago | parent | prev [-]

For one thing, this concern applies equally to names written entirely in Latin script. Can your CSR reps correctly pronounce a French name? How about Polish? Hungarian?

In any case, the proper way to handle this is to store the name as originally written, and have the app that CSRs use provide a phonetic transcription. Coincidentally, this kind of stuff is something that LLMs are very good at already (but I bet you could make it much more efficient by training a dedicated model for the task).

account42 7 months ago | parent [-]

This situation is not the same at all. The CSR might mangle a name in latin script but can at least attempt to pronounce it and will end up doing so in a way that the user can understand.

Add to that that natives of non-latin languages are already used to this.

For better or worse, English and therefore the basic latin script is the lingua franca of the computing age. Having something universal for internation communication is very useful.

GoblinSlayer 7 months ago | parent [-]

FWIW, proquint encoding allows you to pronounce any sequence of bits, though the need for pronunciation eludes me, just copypaste it.