Remix.run Logo
pvillano 4 hours ago

Unicode is "designed to support the use of text in all of the world's writing systems that can be digitized"

Unicode needs tab, space, form feed, and carriage return.

Unicode needs U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK to switch between left-to-right and right-to-left languages.

Unicode needs U+115F HANGUL CHOSEONG FILLER and U+1160 HANGUL JUNGSEONG FILLER to typeset Korean.

Unicode needs U+200C ZERO WIDTH NON-JOINER to encode that two characters should not be connected by a ligature.

Unicode needs U+200B ZERO WIDTH SPACE to indicate a word break opportunity without actually inserting a visible space.

Unicode needs MONGOLIAN FREE VARIATION SELECTORs to encode the traditional Mongolian alphabet.

WalterBright 3 hours ago | parent [-]

[flagged]

bulbar 3 hours ago | parent | next [-]

That's a very narrow view of the world. One example: In the past I have handled bilingual english-arabic files with switches within the same line and Arabic is written from left to right.

There are also languages that are written from to to bottom.

Unicode is not exclusively for coding, to the contrary, pretty sure it's only a small fraction of how Unicode is used.

> Somehow people didn't need invisible characters when printing books.

They didn't need computers either so "was seemingly not needed in the past" is not a good argument.

WalterBright 32 minutes ago | parent | next [-]

> That's a very narrow view of the world.

Yes, it is. Unicode has undergone major mission creep, thinking it is now a font language and a formatting language. Naturally, this has lead to making it a vector for malicious actors. (The direction reversing thing has been used to insert malicious text that isn't visible to the reader.)

> Unicode is not exclusively for coding

I never mentioned coding.

> They didn't need computers

Unicode is for characters, not formatting. Formatting is what HTML is for, and many other formatting standards. Neither is it for meaning.

pibaker an hour ago | parent | prev [-]

> That's a very narrow view of the world.

But not one that would surprise anyone familiar with WalterBright's antics on this website…

jmusall 2 hours ago | parent | prev | next [-]

The fact is that there were so many character sets in use before Unicode because all these things were needed or at least wanted by a lot of people. Here's a great blog post by Nikita Prokopov about it: https://tonsky.me/blog/unicode/

WalterBright 3 hours ago | parent | prev | next [-]

    Look Ma
    xt! N !
    e tee S
    T larip
(No Unicode needed.)
chongli 3 hours ago | parent | prev [-]

Unicode is for human beings, not machines.

WalterBright 28 minutes ago | parent [-]

How does invisible Unicode text fit into that?