Remix.run Logo
account42 5 days ago

A measly factor 16 doesn't really make it worth having to deal with non-power of two sizes. You're also assuming that everything would have used the same number of bites when most sizes are chosen based on how much was needed at the time or in the foreseeable future - with 9 bit bytes that would just have meant that we're just going to run out earlier for different things than with 8 bit bytes.

> IPv4: Everyone knows the story: IPv4 had 32-bit addresses, so about 4 billion total.44 Less due to various reserved subnets. That's not enough in a world with 8 billion humans, and that's lead to NATs, more active network middleware, and the impossibly glacial pace of IPv6 roll-out. It's 2025 and Github—Github!—doesn't support IPv6. But in a world with 9-bit bytes IPv4 would have had 36-bit addresses, about 64 billion total. That would still be enough right now, and even with continuing growth in India and Africa it would probably be enough for about a decade more.

Only if you assume there is only one device per human, which is ridiculous.

> Unicode: In our universe, there are 65 thousand 16-bit characters, which looked like maybe enough for all the world's languages, assuming you're really careful about which Chinese characters you let in.77 Known as CJK unification, a real design flaw in Unicode that we're stuck with. With 9-bit bytes we'd have 262 thousand 18-bit characters instead, which would totally be enough—there are only 155 thousand Unicode characters today, and that's with all the cat smileys and emojis we can dream of. UTF-9 would be thought of more as a compression format and largely sidelined by GZip.

Which would be a lot worse than the current situation because most text like data only uses 8 bits per character. Text isn't just what humans type and includes tons of computer generated ASCII constructs.

Not to mention that now it becomes an active process to upgrade ASCII data to Unicode, which would have the argument of increased size against it for many files and thus files and formats without Unicode support would have stuck around for much longer.

UTF-8 might have been an accident of history in many ways but we really couldn't have wished for something better.