▲ | matja 4 days ago | |
> You can't even clearly define what an "atomic sequence of glyphs" is. Kinda. Grapheme cluster breaks are defined in Unicode, but they have all the baggage and edge-cases you'd expect from human languages evolving over time, so they can be encoded in as a few as a thousand rules : https://github.com/unicode-org/icu/tree/main/icu4c/source/da... |