Remix.run Logo
crazygringo a day ago

If you just use the {Alphabetic} Unicode character class (100K code points), together with a space, hyphen, and maybe comma, that might get you close. It includes diacritics.

I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?

I wondered about numbers, but the most famous example of that has been overturned:

"Originally named X Æ A-12, the child (whom they call X) had to have his name officially changed to X Æ A-Xii in order to align with California laws regarding birth certificates."

(Of course I'm not saying you should do this. It is fun to wonder though.)

Seb-C a day ago | parent | next [-]

> I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?

Latin characters are NOT allowed in official names for Japanese citizens. It must be written in Japanese characters only.

For foreigners living in Japan it's quite frequent to end up in a situation where their official name in Latin does not pass the validation rules of many forms online. Issues like forbidden characters, or because it's too long since Japanese names (family name + first name) are typically only 4 characters long.

Also, when you get a visa to Japan, you have to bend and disform the pronunciation of your name to make it fit into the (limited) Japanese syllabary.

Funnily, they even had to register a whole new unicode range at some point, because old administrative documents sometimes contains characters that have been deprecated more than a century ago.

https://ccjktype.fonts.adobe.com/2016/11/hentaigana.html

crazygringo a day ago | parent [-]

Very interesting about Japan!

To be clear, I wasn't thinking about within a specific country though.

More like, what is the set of all characters that are allowed in legal names across the world?

You know, to eliminate things like emoji, mathematical symbols, and so forth.

Seb-C a day ago | parent [-]

Ah, I see.

I don't know, but I would bet that the sum of all corner cases and exceptions in the world would make it pretty hard to confidently eliminate any "obvious" characters.

From a technical standpoint, unicode emojis are probably safe to exclude, but on the other hand, some scripts like Chinese characters are fundamentally pictograms, which is semantically not so different than an emoji.

Maybe after centuries of evolution we will end up with a legit universal language based on emojis, and people named with it.

crazygringo a day ago | parent [-]

Chinese characters are nothing like emoji. They are more akin to syllables. There is no semantic similarity to emoji at all, even if they were originally derived from pictorial representations.

And they belong to the {Alphabetic} Unicode class.

I'm mostly curious if Unicode character classes have already done all the hard work.

poizan42 a day ago | parent | prev | next [-]

You forgot apostrophe as is common in Irish names like O’Brien.

bloak 18 hours ago | parent [-]

Yes, though O’Brien is Ó Briain in Irish, according to Wikipedia. I think the apostrophe in Irish names was added by English speakers, perhaps by analogy with "o'clock", perhaps to avoid writing something that would look like an initial.

There are also English names of Norman origin that contain an apostrophe, though the only example I can think of immediately is the fictional d'Urberville.

nicoburns a day ago | parent | prev | next [-]

Apostrophe is common in surnames in parts of the world.

jlhwung 5 hours ago | parent | prev | next [-]

https://en.wikipedia.org/wiki/Perri_6

lmm 14 hours ago | parent | prev | next [-]

> I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?

Some Japanese names are written with Japanese characters that do not have Unicode codepoints.

(The Unicode consortium claims that these characters are somehow "really" Chinese characters just written in a different font; holders of those names tend to disagree, but somehow the programmer community that would riot if someone suggested that people with ø in their name shouldn't care when it's written as o accepts that kind of thing when it comes to Japanese).

crazygringo 3 hours ago | parent [-]

Ha, well I don't think we need to worry about validating characters if they can't be typed in a text box in the first place. ;)

But very interesting thanks!

Mordisquitos 16 hours ago | parent | prev | next [-]

> I'm curious if anyone can think of any other non-alphabetic characters used in legal names around the world, in other scripts?

The Catalan name Gal·la is growing in popularity, with currently 1515 women in the census having it as a first name in Spain with an average age of 10.4 years old: https://ine.es/widgets/nombApell/nombApell.shtml

enriquto 4 hours ago | parent [-]

beautiful map of the Catalan Countries when you search for that name here

shash a day ago | parent | prev | next [-]

There’s this individual’s name which involves a clock sound: Nǃxau ǂToma[1]

[1] https://en.m.wikipedia.org/wiki/N%25C7%2583xau_%C7%82Toma

crazygringo a day ago | parent | next [-]

Click characters are part of {Alphabetic}!

https://en.wikipedia.org/wiki/Click_consonant

https://www.compart.com/en/unicode/category/Lo

https://stackoverflow.com/a/4843363

kens 18 hours ago | parent | prev [-]

> There’s this individual’s name which involves a clock sound: Nǃxau ǂToma

I was extremely puzzled until I realized you meant a click sound, not a clock sound. Adding to my confusion, the vintage IBM 1401 computer uses ǂ as a record mark character.

GolDDranks a day ago | parent | prev | next [-]

What if one's name is not in alphabetic script? Let's say, "鈴木涼太".

crazygringo a day ago | parent [-]

That's part of {Alphabetic} in Unicode. It validates.

golergka a day ago | parent | prev | next [-]

דויד Smith (concatenated) will have an LTR control character in the middle

crazygringo a day ago | parent [-]

Oh that's interesting.

Is that a thing? I've never known of anyone whose legal name used two alphabets that didn't have any overlap in letters at all -- two completely different scripts.

Would a birth certificate allow that? Wouldn't you be expected to transliterate one of them?

gus_massa a day ago | parent | prev [-]

Comma or apostrophe, like in d'Alembert ?

(And I have 3 in my keyboard, I'm not sure everyone is using the same one.)

ahazred8ta 19 hours ago | parent [-]

Mrs. Keihanaikukauakahihuliheekahaunaele only had a string length problem, but there are people with a Hawaiian ʻokina in their names. U+02BB