Remix.run Logo
matja 4 days ago

> You can't even clearly define what an "atomic sequence of glyphs" is.

Kinda. Grapheme cluster breaks are defined in Unicode, but they have all the baggage and edge-cases you'd expect from human languages evolving over time, so they can be encoded in as a few as a thousand rules : https://github.com/unicode-org/icu/tree/main/icu4c/source/da...