Remix.run Logo
simonh 10 months ago

No, no, no, no, no… So then we’d get ‘the same’ character with potentially infinite different encodings. Lovely.

Unicode is a coding system, not a glyph system or font.

kristopolous 10 months ago | parent | next [-]

Fonts are already in there and proto-glyphs are too as generalized dicritics. There's also a large variety of generic shapes, lines, arrows, circles and boxes in both filled and unfilled varieties. Lines even have different weights. The absurdity of a custom alphabet can already be partially actualized. Formalism is merely the final step

This conversation was had 20 years ago and your (and my) position lost. Might as well embrace the inevitable instead of insisting on the impossible.

Whether you agree with it or not won't actually affect unicode's outcome, only your own.

simonh 10 months ago | parent | next [-]

Unicode does not specify any fonts, though many fonts are defined to be consistent with the Unicode standard, nevertheless they are emphatically not part of Unicode.

How symbols including diacritics are drawn and displayed is not a concern for Unicode, different fonts can interpret 'filled circle' or the weight of a glyph as they like, just as with emoji. By convention they generally adopt common representations but not entirely. For example try using the box drawing characters from several different fonts together. Some work, many don't.

kristopolous 10 months ago | parent [-]

You can say things like the different "styles" that exploit Unicode on a myriad of websites such as https://qaz.wtf/u/convert.cgi?text=Hello are not technically "fonts" but it's a distinction without a meaningful difference. You have script, fraktur, bold, monospace, italic...

simonh 10 months ago | parent [-]

Fraktur is interesting because it’s more a writing style, verging in a character set in its own right. However Unicode doesn’t directly support all of its ligatures and such.

None of this is in any way justification for turning Unicode into something like SVG. Even the pseudo-drawing capabilities it does have are largely for legacy reasons.

kristopolous 10 months ago | parent [-]

Fraktur at one point was genuinely a different script

You can find texts in the late 1500-early 1900s at least that will switch to a fraktur style when quoting or using German.

ANSI escape codes even accommodates for it. Codepoint 20: https://en.m.wikipedia.org/wiki/ANSI_escape_code#Select_Grap...

Don't ask me why, I only work here.

See also https://en.wikipedia.org/wiki/Antiqua%E2%80%93Fraktur_disput...

I also don't find any of my predictions defensible as much as I believe they're inevitable. Again I've got no agency here.

jrochkind1 10 months ago | parent | prev [-]

The diacritics are there because they were in legacy encodings, and it was decided at some point that encodings should be round-trippable between legacy encodings and unicode.

The fact that hardly anyone cares any longer are about going to any legacy non-unicode encoding is, of course, a testament to the success of unicode, a success that required not only technical excellence but figuring out what would actually work for people to actually adopt practically. It worked. It's adopted.

I have no idea if the diacritics choice was the right one or not, but I guarantee if it had been made differently people would be complaining about how things aren't round-trippable to unicode encoding and back from some common legacy encoding, and that's part of it's problem.

I think some combining diacritics are also necessary for some non-latin scripts, where it is (or was) infeasible to have a codepoint for every possible combination.

The choices in Unicode are not random. The fact that it has become universal (so many attempts at standards have not) is a pretty good testatement to it's success at balancing a bunch of competing values and goals.

int_19h 10 months ago | parent [-]

It's the other way around - precombined characters (with diacritics) are there because they were in legacy encodings. But, assuming that by "generalized diacritics" OP means Unicode combining characters like U+0301, there's nothing legacy about them; on the contrary, the intent is to prefer them over precombined variants, which is why new precombined glyphs aren't added.

jrochkind1 10 months ago | parent [-]

Ah, right, thanks!

numpad0 10 months ago | parent | prev [-]

macOS already does different encoding for filenames in Japanese than what Windows/Linux do, and I'm sure someone mentioned same situation in Korean here.

Unicode is already a non-deterministic mess.

simonh 10 months ago | parent [-]

And that justifies making it an even more complete mess, in new and dramatically worse ways?