Remix.run Logo
kristopolous 10 months ago

There's no argument here.

We could say it's only for script and alphabets, ok. It includes many undeciphered writing systems from antiquity with only a small handful of extent samples.

Should we keep that, very likely to never be used character set, but exclude the extremely popular emojis?

Exclude both? Why? Aren't computers capable enough?

I used to be on the anti emoji bandwagon but really, it's all indefensible. Unicode is characters of communication at an extremely inclusive level.

I'm sure some day it will also have primitive shapes and you can construct your own alphabet using them + directional modifiers akin to a generalizable Hangul in effect becoming some kind of wacky version of svg that people will abuse it in an ASCII art renaissance.

So be it. Sounds great.

simonh 10 months ago | parent | next [-]

No, no, no, no, no… So then we’d get ‘the same’ character with potentially infinite different encodings. Lovely.

Unicode is a coding system, not a glyph system or font.

kristopolous 10 months ago | parent | next [-]

Fonts are already in there and proto-glyphs are too as generalized dicritics. There's also a large variety of generic shapes, lines, arrows, circles and boxes in both filled and unfilled varieties. Lines even have different weights. The absurdity of a custom alphabet can already be partially actualized. Formalism is merely the final step

This conversation was had 20 years ago and your (and my) position lost. Might as well embrace the inevitable instead of insisting on the impossible.

Whether you agree with it or not won't actually affect unicode's outcome, only your own.

simonh 10 months ago | parent | next [-]

Unicode does not specify any fonts, though many fonts are defined to be consistent with the Unicode standard, nevertheless they are emphatically not part of Unicode.

How symbols including diacritics are drawn and displayed is not a concern for Unicode, different fonts can interpret 'filled circle' or the weight of a glyph as they like, just as with emoji. By convention they generally adopt common representations but not entirely. For example try using the box drawing characters from several different fonts together. Some work, many don't.

kristopolous 10 months ago | parent [-]

You can say things like the different "styles" that exploit Unicode on a myriad of websites such as https://qaz.wtf/u/convert.cgi?text=Hello are not technically "fonts" but it's a distinction without a meaningful difference. You have script, fraktur, bold, monospace, italic...

simonh 10 months ago | parent [-]

Fraktur is interesting because it’s more a writing style, verging in a character set in its own right. However Unicode doesn’t directly support all of its ligatures and such.

None of this is in any way justification for turning Unicode into something like SVG. Even the pseudo-drawing capabilities it does have are largely for legacy reasons.

kristopolous 10 months ago | parent [-]

Fraktur at one point was genuinely a different script

You can find texts in the late 1500-early 1900s at least that will switch to a fraktur style when quoting or using German.

ANSI escape codes even accommodates for it. Codepoint 20: https://en.m.wikipedia.org/wiki/ANSI_escape_code#Select_Grap...

Don't ask me why, I only work here.

See also https://en.wikipedia.org/wiki/Antiqua%E2%80%93Fraktur_disput...

I also don't find any of my predictions defensible as much as I believe they're inevitable. Again I've got no agency here.

jrochkind1 10 months ago | parent | prev [-]

The diacritics are there because they were in legacy encodings, and it was decided at some point that encodings should be round-trippable between legacy encodings and unicode.

The fact that hardly anyone cares any longer are about going to any legacy non-unicode encoding is, of course, a testament to the success of unicode, a success that required not only technical excellence but figuring out what would actually work for people to actually adopt practically. It worked. It's adopted.

I have no idea if the diacritics choice was the right one or not, but I guarantee if it had been made differently people would be complaining about how things aren't round-trippable to unicode encoding and back from some common legacy encoding, and that's part of it's problem.

I think some combining diacritics are also necessary for some non-latin scripts, where it is (or was) infeasible to have a codepoint for every possible combination.

The choices in Unicode are not random. The fact that it has become universal (so many attempts at standards have not) is a pretty good testatement to it's success at balancing a bunch of competing values and goals.

int_19h 10 months ago | parent [-]

It's the other way around - precombined characters (with diacritics) are there because they were in legacy encodings. But, assuming that by "generalized diacritics" OP means Unicode combining characters like U+0301, there's nothing legacy about them; on the contrary, the intent is to prefer them over precombined variants, which is why new precombined glyphs aren't added.

jrochkind1 10 months ago | parent [-]

Ah, right, thanks!

numpad0 10 months ago | parent | prev [-]

macOS already does different encoding for filenames in Japanese than what Windows/Linux do, and I'm sure someone mentioned same situation in Korean here.

Unicode is already a non-deterministic mess.

simonh 10 months ago | parent [-]

And that justifies making it an even more complete mess, in new and dramatically worse ways?

riwsky 10 months ago | parent | prev [-]

Like how phonetic alphabets save space compared to ideograms by just “write the word how it sounds”, the little SVG-icode would just “write the letter how it’s drawn”

kristopolous 10 months ago | parent | next [-]

Right. Semantic iconography need not be universal or even formal to be real.

Think of all the symbols notetakers invent; ideographs without even phonology assigned to it.

Being as dynamic as flexible as human expression is hard.

Emojis have even taken on this property naturally. The high-5 is also the praying hands for instance. Culturally specific semantics are assigned to the variety of shapes, such as the eggplant and peach.

Insisting that this shouldn't happen is a losing battle against how humans construct written language. Good luck with that.

10 months ago | parent | prev [-]
[deleted]