Remix.run Logo
ajsnigrutin 7 months ago

Yeah, that'll work great..

https://en.wikipedia.org/wiki/%C4%8Celje

echo "Čelje" | uconv -f "UTF-8" -t "UTF-8" -x "Latin-ASCII"

> "Celje"

https://en.wikipedia.org/wiki/Celje

(i mean... we do have postal numbers just for problems like this, but both Štefan and Stefan are not-so-uncommon male names over here, so are Jozef and Jožef, etc.)

jeroenhd 7 months ago | parent | next [-]

If you're dealing with a bad API that only takes ASCII, "Celje" is usually better than "ÄŒelje" or "蒌elje".

If you have control over the encoding on the input side and on the output side, you should just use UTF-8 or something comparable. If you don't, you have to try to get something useful on the output side.

ajsnigrutin 7 months ago | parent [-]

This depends.

Everyone over here would know that "ÄŒelje" (?elje) is either čelje, šelje or želje. Maybe even đelje or ćelje if it's a name or something else. So, special attention would be taken to 'decypher' what was meant here.

But if you see "Celje", you assume it's actually Celje (a much larger city than Čelje) and not one of those variants above. And noone will bother with figuring out if part of a letter is missing, it'll just get sent to Celje.

Muromec 7 months ago | parent | prev | next [-]

Most places where telling Štefan from Stefan is a problem use postal numbers for people too, or/and ask for your DOB.

ajsnigrutin 7 months ago | parent [-]

I don't have a problem from differentiatin Štefan from Stefan, 's' and 'š' sound pretty different to everyone around here. But if someone runs that script above and transliterates "š" to "s" it can cause confusion.

And no, we don't use "postal numbers for humans".

Muromec 7 months ago | parent [-]

>And no, we don't use "postal numbers for humans".

An email, a phone number, a tax or social security number, demographic identifier, billing/contract number or combination of them.

All of those will help you tell Stefan from Štefan in the most practical situations.

>But if someone runs that script above and transliterates "š" to "s" it can cause confusion.

It's not nice, it will certainly make Štefan unhappy, but it's not like you will debit the money from the wrong account or deliver to a different address or contact the wrong customer because of that.

account42 7 months ago | parent | prev [-]

So? Names are not unique to begin with.