▲ | account42 7 days ago | |
ASCII compatibility isn't the only advantage of UTF-8 over UCS-4. It also - requires less memory for most strings, particular ones that are largely limited to ASCII like structured text-based formats often are. - doesn't need to care about byte order. UTF-8 is always UTF-8 while UTF-16 might either be little or big endian and UCS-4 could theoretically even be mixed endian. - doesn't need to care about alignment: If you jump to a random memory position you can find the next and previous UTF-8 characters. This also means that you can use preexisting byte-based string functions like substring search for many UTF-8 operations. |