| ▲ | BobbyTables2 2 hours ago | ||||||||||||||||||||||
Damn, I’ve never really had to deal with Unicode all that much. Was already bad enough that instead of bytes, we have to worry about code points. Now even that isn’t enough? It would have been expensive, but all characters should have been fixed size 64bit values. | |||||||||||||||||||||||
| ▲ | usrnm 2 hours ago | parent | next [-] | ||||||||||||||||||||||
> It would have been expensive, but all characters should have been fixed size 64bit values You're making the same mistake that numerous people made before you: thinking that it's as simple as using arrays of large enough numbers. First they thought that two bytes per symbol would be enough, then four. Spoiler alert: it wasn't. And eight won't work either. | |||||||||||||||||||||||
| |||||||||||||||||||||||
| ▲ | chuckadams 2 hours ago | parent | prev [-] | ||||||||||||||||||||||
> It would have been expensive, but all characters should have been fixed size 64bit values. It would have been a non-starter, and then we'd all be dealing with Shift-JIS, BIG5, and FSM knows how many different codepages to this day. UTF-8 is about as elegant as it gets, though Java and JS still managed to fuck that up too (they both encode every codepoint outside the BMP as surrogate pairs in UTF-8) | |||||||||||||||||||||||
| |||||||||||||||||||||||