▲ | mort96 4 days ago | |||||||||||||||||||||||||
That isn't really a case of UTF-8 sacrificing anything to be compatible with UTF-16. It's Unicode, not UTF-8 that made the sacrifice: Unicode is limited to 21 bits due to UTF-16. The UTF-8 design trivially extends to support 6 byte long sequences supporting up to 31-bit numbers. But why would UTF-8, a Unicode character encoding, support code points which Unicode has promised will never and can never exist? | ||||||||||||||||||||||||||
▲ | MyOutfitIsVague 4 days ago | parent | next [-] | |||||||||||||||||||||||||
In an ideal future (read: fantasy), utf-16 gets formally deprecated and trashed, freeing the surrogate sequences and full range for utf-8. Or utf-16 is officially considered a second class citizen, and some code points are simply out of its reach. | ||||||||||||||||||||||||||
▲ | GuB-42 4 days ago | parent | prev [-] | |||||||||||||||||||||||||
Is 21 bits really a sacrifice. It is 2 million codepoints, we currently use about a tenth of that. Even with all Chinese characters, de-unified, all the notable historical and constructed scripts, technical symbols, and all the submitted emoji, including rejections, you are still way short of a million. We are probably never need more than 21 bits unless we start stretching the definition of what text is. | ||||||||||||||||||||||||||
|