▲ | tralarpa 7 days ago | |
Fascinating and annoying problem, indeed. In Java, the correct way to iterate over the characters (Unicode scalar values) of a string is to use the IntStream provided by String::codePoints (since Java 8), but I bet 99.9999% of the existing code uses 16-bit chars. | ||
▲ | zahlman 6 days ago | parent | next [-] | |
This does not fix the problem. The emoji consists of multiple Unicode characters (in turn represented 1:1 by the integer "code point" values). There is much more to it than the problem of surrogate pairs. | ||
▲ | ivanjermakov 6 days ago | parent | prev [-] | |
Codepoint is not cluster and cluster is not character. I bet there is "50 falsehoods about Unicode". |