Remix.run Logo
numpad0 3 days ago

> not just unified characters, but also common characters that are not considered unified are also often varying across countries.

That's the unification, the issues stemming from CJKVs each not having own code points. The issue is not that CJKVs need multiple font files and it's cumbersome, the issue is that no two CJKV fonts may be loaded at the same time because there are conflicting glyphs. Conflicting glyphs. That's just wrong.

lifthrasiir 3 days ago | parent | next [-]

If you somehow want to display, say, both Japanese and Chinese texts at the same time, there is no technical obstacle that prevents you to do so. Pan-Unicode fonts come with differently named files for CJKV characters so that is not even difficult. Yes, your assets will have multiple multi-megabyte font files. Is that a problem for modern games? I don't think so.

There is a single circumstance where this is not generally doable: a user name in globally serviced online games. (Guess why I know of this case...) Unless there is a hint that a particular user prefers one's user name to be displayed in a certain way, it is difficult to decide which font to use (or even which set of fonts to use). But it's a very niche problem and otherwise you know which language of the text you are showing and can pick the correct font from your assets.

numpad0 2 days ago | parent [-]

What you've said is correct, but it also means Unicode strings containing CJKV characters become mildly corrupt if decoded without a "--interpret-as=<language>" option to change binary-glyph correspondence. That's just not what Unicode should stand for.

You should not need to keep or infer the language hint. I know it was always the officially sanctioned way and what developer engaged in i18n work has to live with. My point is NOT that you are wrong but that part of Unicode spec is wrong.

zahlman 2 days ago | parent | prev [-]

> Conflicting glyphs.

Which could be chosen between using variation selectors.

numpad0 2 days ago | parent [-]

I guess, but I've never heard there's a `cat text | ivs-convert --from=utf8 --to=zh-Hans` type of things. So practically almost non-existent.