Remix.run Logo
chuckadams 2 hours ago

> It would have been expensive, but all characters should have been fixed size 64bit values.

It would have been a non-starter, and then we'd all be dealing with Shift-JIS, BIG5, and FSM knows how many different codepages to this day. UTF-8 is about as elegant as it gets, though Java and JS still managed to fuck that up too (they both encode every codepoint outside the BMP as surrogate pairs in UTF-8)

chrismorgan an hour ago | parent | next [-]

> Java and JS […] both encode every codepoint outside the BMP as surrogate pairs in UTF-8

I can’t comment on Java, but JS I know reasonably well and I can’t think of any place it uses CESU-8.

dasyatidprime an hour ago | parent | prev [-]

That's called CESU-8. https://www.unicode.org/reports/tr26/tr26-4.html