| ▲ | chrismorgan an hour ago | |
> Unicode code points are 32 bit 21-bit, actually. It was supposed to be 32-bit, but UTF-16 caps out at 21-bit, so they lopped eleven bits of potential from Unicode (and UTF-8, so no more six-byte encoding). > at some point before Unicode No, in the early days of Unicode. > run length encodes Um… what? RLE is a data compression thing, UTF-16 has nothing to do with it. | ||
| ▲ | Someone 17 minutes ago | parent [-] | |
>> Unicode code points are 32 bit > 21-bit, actually Less than that. https://en.wikipedia.org/wiki/Code_point#In_character_encodi...: “The Unicode code space is divided into seventeen planes (the basic multilingual plane, and 16 supplementary planes), each with 65,536 (= 2¹⁶) code points. Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112” That makes it log(1,114,112)/log(2) bit. That’s about 20,09. (https://www.unicode.org/versions/Unicode17.0.0/ assigns 159,801 of them to characters) | ||