▲ | acdha 7 hours ago | |
The need was clear even 30 years ago when UTF-16 was standardized in 1996. UCS-2 was known at the time to be inadequate but there was a period from the mid-80s to early 90s where western developers tried to rpetend that they could only support a tiny fraction of Asian languages like Chinese (>50k characters, even if Han unification was uncontroversial), scholarly and technical usage, etc. The language used in 1988 was “Unicode aims in the first instance at the characters published in modern text (e.g. in the union of all newspapers and magazines printed in the world in 1988)” with the idea that other characters could be punted into a private registry. Once enough people accepted that this approach was impractical, UCS-2 was replaced with UTF-16 and surrogate codes. At that point it was clear that UTF-8 was better in almost every scenario because neither had an advantage for random access and UTF-8 was usually substantially smaller. |