▲ | torstenvl 6 days ago | |||||||||||||
I really wish people would stop giving this bad advice, especially so stridently. Like it or not, code points are how Unicode works. Telling people to ignore code points is telling people to ignore how data works. It's of the same philosophy that results in abstraction built on abstraction built on abstraction, with no understanding. I vehemently dissent from this view. | ||||||||||||||
▲ | shiomiru 6 days ago | parent | next [-] | |||||||||||||
> Telling people to ignore code points Nobody is saying that, the point is that if you're parsing Unicode by counting codepoints you're doing it wrong. The way you actually parse Unicode text (in 99% of cases) is by iterating through the codepoints, and then the actual count is fairly irrelevant, it's just a stream. Other uses of codepoint length are also questionable: for measurement it's useless, for bounds checking (random access) it's inefficient. It may be useful in some edge cases, but TFA's point is that a general purpose language's default string type shouldn't optimize for edge cases. | ||||||||||||||
▲ | dcrazy 6 days ago | parent | prev | next [-] | |||||||||||||
You’re arguing against a strawman. The advice wasn’t to ignore learning about code points; it’s that if your solution to a problem involves reasoning about code points, you’re probably doing it wrong and are likely to make a mistake. Trying to handle code points as atomic units fails even in trivial and extremely common cases like diacritics, before you even get to more complicated situations like emoji variants. Solving pretty much any real-world problem involving a Unicode string requires factoring in canonical forms, equivalence classes, collation, and even locale. Many problems can’t even be solved at the _character_ (grapheme) level—text selection, for example, has to be handled at the grapheme _cluster_ level. And even then you need a rich understanding of those graphemes to know whether to break them apart for selection (ligatures like fi) or keep them intact (Hangul jamo). Yes, people should learn about code points. Including why they aren’t the level they should be interacting with strings at. | ||||||||||||||
| ||||||||||||||
▲ | eviks 6 days ago | parent | prev [-] | |||||||||||||
> Telling people to ignore code points is telling people to ignore how data works. No, it's telling people that they're don't understand how data works otherwise they'd be using a different unit of measurement |