Remix.run Logo
weinzierl 4 days ago

I wouldn’t be so harsh. I think the Unicode Consortium not only started with good intentions but also did excellent work for the first decade or so.

I just think they got distracted when the problems got harder, and instead of tackling them head-on, they now waste a lot of their resources on busywork - good intentions notwithstanding. Sure, it’s more fun standardizing sparkling disco balls than dealing with real-world pain points. That OpenType is a good and powerful standard which masks some of Unicode’s shortcomings doesn’t really help.

It’s not too late, and I hope they will find their way back to their original mission and be braver in solving long-standing issues.

zahlman 4 days ago | parent | next [-]

A big part of the problem is that the reaction to early updates was so bad that they promised they would never un-assign or re-assign a code point ever again, making it impossible for them to actually correct any mistakes (not even typos in the official standard names given to characters).

The versioning is actually almost completely backwards by semver reasoning; 1.1 should have been 2.0, 2.0 should have been 3.0 and we should still be on 3.n now (since they have since kept the promise not to remove anything).

socalgal2 4 days ago | parent | prev | next [-]

What could be better? Human languages are complex

weinzierl 4 days ago | parent | next [-]

Yes, exactly, human languages are complex and in my opinion Unicode used to be on a good track to tackle these complexities. I just think that nowadays they are not doing enough to help people around the world solving these problems.

pas 4 days ago | parent [-]

can you describe a few examples? what are you missing? or maybe are you aware of something they rejected that would be useful?

weinzierl 4 days ago | parent [-]

The elephant in the room is Han Unification but there are plenty of other issues. Here is one of my favourites from another thread just two days ago.

https://news.ycombinator.com/item?id=44971254

This is the rejected proposal.

https://www.unicode.org/L2/L2003/03215-n2593-umlaut-trema.pd...

If you read thread from above you will find more examples from other people.

pas 3 days ago | parent [-]

thanks! very interesting!

ah, and now I understand what the hell people mean when they put dots on coordinate! (but they are obviously wrong they should use the flying point from Catalan :)

... hm, so this issue is easily more than 20 years old. and since then there's no solution (or the German libraries consider the problem "solved" and ... no one else is making proposals to the WG about this nowadays)?

also, technically - since there are already more than 150K allocated code points - adding a different combining mark seems the correct way to do, right?

or it's now universally accepted that people who want to type ambigüité need to remember to type U+034F before the ü? (... or, of course it's up to their editor/typesetter software to offer this distinction)

regarding the Han unification, is there some kind of effort to "fix" that? (adding language-start language-end markers perhaps? or virtual code points for languages to avoid the need for searching strings for the being-end markers?)

4 days ago | parent | prev | next [-]
[deleted]
pas 4 days ago | parent | prev [-]

sure, but they have both human and machine stuff in the same "universe" - again, sure, it made sense, but maybe it would make sense to have a parser that helps to recover "human stuff" from "machine gibberish" (ie. filter out the presentation and control stuff), but, but, of course some in-band logic makes sense, after all, for the combinations (diacritics, emoji skin color, and so on).

yk 4 days ago | parent | prev [-]

I would. The original sin of Unicode is really their manifold idea, at that point they stopped trying to write a string standard and started to become a kinda general description of how string standards should look like and hopefully string standards that more or less conform to this description are interoperable if you remember which direction "string".decode() and "string".encode() is.