| ▲ | WorldMaker a day ago | |||||||
> Emojis and other symbols introduced new notions like colors that were not present before. I'm no longer certain that it is feasible to handcraft a font thwt contains all the symbols for codepoints affected by color modifiers. Color modifiers are just ZWJ sequences. Those existed before. The color modifiers themselves are not the most complicated things that get attached to ZWJ sequences among languages that Unicode supports. OpenType today supports color tables that mean most emoji modified by colors aren't "handcrafted" but algorithmically constructed. (As many ligatures and other ZWJ sequences often are.) > Also, 8 bit codepages, for all their problems (a different kind of hell), didn't break the assumption that each character is encoded as one byte. That is broken in other 8-bit codepages as well, it was just seen as an exception/edge case rather than the rule. The big obvious exception has always been \r\n (carriage return then newline), but there's also ^H (control-H) and ^W (control-W) sequences (effectively backspace and delete word), and the entire gamut of things done with ANSI and/or VT100 escape seqences starting with Escape often stylized as ^[. > And UTF-8 specifically gives the illusion to English speakers that using naive 8 bit string handling works. Unless emoji are present, which is one of the great things about emoji and emoji becoming a very common form of punctuation in English text. Naive 8-bit string handling was always wrong. Emoji help make it visible how wrong it was. (In part by doing things other languages do such as ZWJ sequences and having code points out in the Astral Plane and other such features.) | ||||||||
| ▲ | gmueckl a day ago | parent [-] | |||||||
So you agree that font rendering had to be extended to support color modifiers as specified in Unicode? That is the kind of completely creep that I am pointing out. A bunch of control codes are historically part of character encodings, and their encoding is very consistent within codepages of the same family (ASCII/ANSI and EBCDIC). You don't have to have any awareness about the active codepage/language to handle them correctly. Terminal escape sequences are a poor form of in-band signaling between devices (now virtualized), not text. I comsider that out of scope. Anyway, as we get into the weeds here, I do not want to dispute the enormous practical utility of Unicode and I am glad that it exists and covers so many of the world's writing systems and alphabets. It is one of the central standards that connects people today. But from the purely technical perspective, the steady complexity creep is undeniable and brings somewhat hidden costs to software systems. | ||||||||
| ||||||||