▲ | thaumasiotes 4 days ago | |
> Falsehoods programmers believe about written language: whitespace is used to separate atomic sequences of runes. Really? That isn't just untrue of written language in general. It's untrue of every individual written language in specific. You can't even clearly define what an "atomic sequence of glyphs" is. | ||
▲ | matja 4 days ago | parent | next [-] | |
> You can't even clearly define what an "atomic sequence of glyphs" is. Kinda. Grapheme cluster breaks are defined in Unicode, but they have all the baggage and edge-cases you'd expect from human languages evolving over time, so they can be encoded in as a few as a thousand rules : https://github.com/unicode-org/icu/tree/main/icu4c/source/da... | ||
▲ | rune-space 3 days ago | parent | prev [-] | |
Which makes one wonder why REST puts so much weight on them being divided by WS! |