| ▲ | amluto 2 hours ago | |
Modern string libraries largely use UTF-8 [0], and surrogates, regardless of whether they’re paired, are invalid in UTF-8. So, in a modern string library, as built in to most modern languages, you will not encounter surrogates except when translating between encodings. [0] But everyone disagrees as to what indexing a string means, so you need to make an actual choice if you want anything involving indexing to match across languages. | ||
| ▲ | chuckadams an hour ago | parent [-] | |
> surrogates, regardless of whether they’re paired, are invalid in UTF-8 Java did not get the memo. Since the char type is fixed at 16 bits, it uses surrogates to encode everything outside the BMP, regardless of the encoding. | ||