Remix.run Logo
jrochkind1 10 months ago

The diacritics are there because they were in legacy encodings, and it was decided at some point that encodings should be round-trippable between legacy encodings and unicode.

The fact that hardly anyone cares any longer are about going to any legacy non-unicode encoding is, of course, a testament to the success of unicode, a success that required not only technical excellence but figuring out what would actually work for people to actually adopt practically. It worked. It's adopted.

I have no idea if the diacritics choice was the right one or not, but I guarantee if it had been made differently people would be complaining about how things aren't round-trippable to unicode encoding and back from some common legacy encoding, and that's part of it's problem.

I think some combining diacritics are also necessary for some non-latin scripts, where it is (or was) infeasible to have a codepoint for every possible combination.

The choices in Unicode are not random. The fact that it has become universal (so many attempts at standards have not) is a pretty good testatement to it's success at balancing a bunch of competing values and goals.

int_19h 10 months ago | parent [-]

It's the other way around - precombined characters (with diacritics) are there because they were in legacy encodings. But, assuming that by "generalized diacritics" OP means Unicode combining characters like U+0301, there's nothing legacy about them; on the contrary, the intent is to prefer them over precombined variants, which is why new precombined glyphs aren't added.

jrochkind1 10 months ago | parent [-]

Ah, right, thanks!