▲ | jeroenhd 16 hours ago | ||||||||||||||||||||||
Computers and localisation weren't relevant back in the early 20th century. The dotless existed before the dotted i (in Greek script as iota). Some European scholars putting an extra dot on the letter to make it stand out a bit more are as much to blame as the Turks for making the distinction between the different i-vowels clear. Really, this bug is nothing but programmers failing to take into account that not everybody writes in English. | |||||||||||||||||||||||
▲ | JuniperMesos 14 hours ago | parent | next [-] | ||||||||||||||||||||||
It's not exactly programmers failing to take into account that no everybody writes in English - if that were the case, then it would simply be impossible to represent the Turkish lowercase-dotless and uppercase-dotted I at all. The actual problem is failing to take into account that operations on text strings that work in one language's writing might not work the same way in a different language's writing system. There's a lot of languages in the world that use the Latin writing system, and even if you are personally a fluent speaker and writer of several of them, you might simply have not learned about Turkish's specific behavior with I. | |||||||||||||||||||||||
▲ | jagrsw 14 hours ago | parent | prev | next [-] | ||||||||||||||||||||||
> that not everybody writes in English. I don't know... I understand the history and reasons for this capitalization behavior in Turkish, and my native language isn't English, which had to use a lot of strange encodings before the introduction of UTF-8. But messing around with the capitalization of ASCII <= codepoint(127) is a risky business, in my opinion. These codepoints are explicitly named: "LATIN CAPITAL LETTER I" "LATIN SMALL LETTER I" and requiring them to not match exactly during capitalization/diminuitization sounds very risky. | |||||||||||||||||||||||
▲ | troad 11 hours ago | parent | prev [-] | ||||||||||||||||||||||
> Really, this bug is nothing but programmers failing to take into account that not everybody writes in English. This bug is the exact opposite of that. The program would have worked fine had it used pure ASCII transforms (±0x20); it was the use of library functions that did in fact take Turkish into account that caused the problem. More broadly, this is not an easy issue to solve. If a Turkish programmer writes code, what is the expected behaviour for metaprogramming and compilers? Are the function names in English or Turkish? What about variables, object members, struct fields? You could have one variable name that references some government ID number using its native Turkish name, right next to another variable name that uses the English "ID". How does the compiler know what locale to use for which symbol? Boiling all of this down to 'just be more considerate' is not actually constructive or actionable. | |||||||||||||||||||||||
|