▲ | numpad0 3 days ago | ||||||||||||||||||||||||||||||||||||||||
It's an upheld decision. The unification is not about reducing character counts overall, but to co-mingle CJKV languages. Adding more characters is not un-mingling existing characters. One thing I feared might happen and do seem to be happening is, Chinese LLMs and AI projects seem to be moving towards Chinese-English bilingual models away from regular omni-lingual models, which, I think is, because LLMs would become confused with Chinese-invalid syntaxes and dictionary definitions, and/or generally perform worse, if substantial non-Chinese CJKV data was included in the dataset. At the polar opposite of computing, Hollow Knight: Sliksong released just days prior is having Han Unification font problem as well: as you might know, thanks to Han Unification, CJKV languages each require its own font, of which no two cannot be active at the same time, and characters become mangled if application developer spends substantial cost implementing such non-standard feature. The developers was not aware of that, and did not spend extra cost doing so, and is getting review bombed in China. It just needs to be reversed. It's a real problem. Adding more obscure characters and obscure features is tangential and not a solution. Different isolated clusters of characters uses need to be separated, not overlapped into one same area, like there are no "GermanFrench-English dictionary". | |||||||||||||||||||||||||||||||||||||||||
▲ | lifthrasiir 3 days ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||
The unification, implemented in Unicode 1.1, is definitely a character count reduction mechanism. I'm very sure that if the decision to abandon 16-bit character set was done earlier then the unification wouldn't have happened. And I'm saying this as a CJKV person and past gamedev: CJKV languages each require its own font no matter whether the Han unification is implemented or not. There are simply too many glyphs there; not just unified characters, but also common characters that are not considered unified are also often varying across countries. If you account for all those glyph variations in a single font, you just can't cope up because OpenType only supports at most 65,536 glyphs in a single typeface. In the alternative universe OpenType may have been extended to allow more glyphs in a single typeface, I don't know, but CJKV characters are simply complex enough to require multiple font files in general. Han unification is of less concern when you have too many glyphs. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
▲ | eviks 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||
> like there are no "GermanFrench-English dictionary" But there is a single Latin alphabet | |||||||||||||||||||||||||||||||||||||||||
|