▲ | GuB-42 4 days ago | ||||||||||||||||
Is 21 bits really a sacrifice. It is 2 million codepoints, we currently use about a tenth of that. Even with all Chinese characters, de-unified, all the notable historical and constructed scripts, technical symbols, and all the submitted emoji, including rejections, you are still way short of a million. We are probably never need more than 21 bits unless we start stretching the definition of what text is. | |||||||||||||||||
▲ | moefh 4 days ago | parent [-] | ||||||||||||||||
It's not 2 million, it's a little over 1 million. The exact number is 1112064 = 2^16 - 2048 + 16*2^16: in UTF-16, 2 bytes can encode 2^16 - 2048 code points, and 4 bytes can encode 16*2^16 (the 2048 surrogates are not counted because they can never appear by themselves, they're used purely for UTF-16 encoding). | |||||||||||||||||
|