Wouldn't at least the first issue be solved by using Unicode case folding instead of lowercase? Python, for example, has separate .casefold() and .lower() methods, and AFAIK casefold would always turn I into i, and is much more appropriate for this use case.

▲

rhdunn 5 hours ago | parent | next [-]

There are 3 types of case folding:

1. Simple one-to-one mappings -- E.g. `T` to `t`. These are typically the ones handled by `lower()` or similar methods as they work on single characters so can modify a string in place (the length of the string doesn't change).

2. More complex one-to-many mappings -- E.g. German `ß` to `ss`. These are covered by functions like `casefold()`. You can't modify the string in place so the function needs to always write to a new string buffer.

3. Locale-specific mappings -- This is what this bug is about. In Turkish `I` maps to `ı` whereas other languages/locales it maps to `i`. You can only implement this by passing the locale to the case function, irrespective of whether you are also doing (1) or (2).

	▲	zahlman 3 hours ago \| parent [-]
		This is not quite right, at least for Python. .upper() and .lower() (and .casefold() as well) implement the default casing algorithms from the Unicode specification, which are one-to-many (but still locale-naive). Other languages, meanwhile, might well implement locale-aware mapping that defaults to the system locale rather than requiring a locale to be passed.

▲

zahlman 3 hours ago | parent | prev [-]

Both .casefold() and .lower() in Python use the default Unicode casing algorithms. They're unicode-aware, but locale-naive. So .lower() also works for this purpose; the point of .casefold() is more about the intended semantics.

See also: https://stackoverflow.com/questions/19030948 where someone sought the locale-sensitive behaviour.