| ▲ | chuckadams 2 hours ago | |
> It would have been expensive, but all characters should have been fixed size 64bit values. It would have been a non-starter, and then we'd all be dealing with Shift-JIS, BIG5, and FSM knows how many different codepages to this day. UTF-8 is about as elegant as it gets, though Java and JS still managed to fuck that up too (they both encode every codepoint outside the BMP as surrogate pairs in UTF-8) | ||
| ▲ | chrismorgan an hour ago | parent | next [-] | |
> Java and JS […] both encode every codepoint outside the BMP as surrogate pairs in UTF-8 I can’t comment on Java, but JS I know reasonably well and I can’t think of any place it uses CESU-8. | ||
| ▲ | dasyatidprime an hour ago | parent | prev [-] | |
That's called CESU-8. https://www.unicode.org/reports/tr26/tr26-4.html | ||