Remix.run Logo
zamadatix 6 days ago

Because we have 8 bit bytes we are familiar with the famous or obvious cases multiples-of-8-bits ran out, and those cases sound a lot better with 12.5% extra bits. What's harder to see in this kind of thought experiment is what the famously obvious cases multiples-of-9-bits ran out would have been. The article starts to think about some of these towards the end, but it's hard as it's not immediately obvious how many others there might be (or, alternatively, why it'd be significantly different total number of issues than 8 bit bytes had). ChatGPT particularly isn't going to have a ton of training data about the problems with 9 bit multiples running out to hand feed you.

It also works in the reverse direction too. E.g. knowing networking headers don't even care about byte alignment for sub fields (e.g. a VID is 10 bits because it's packed with a few other fields in 2 bytes) I wouldn't be surprised if IPv4 would have ended up being 3 byte addresses = 27 bits, instead of 4*9=36, since they were more worried with small packet overheads than matching specific word sizes in certain CPUs.

pavpanchekha 6 days ago | parent | next [-]

Author here. Actually I doubt we'd have picked 27-bit addresses. That's about 134M addresses; that's less than the US population (it's about the number of households today?) and Europe was also relevant when IPv4 was being designed.

In any case, if we had chosen 27-bit addresses, we'd have hit exhaustion just a bit before the big telecom boom that built out most of the internet infrastructure that holds back transition today. Transitioning from 27-bit to I don't know 45-bit or 99-bit or whatever we'd choose next wouldn't be as hard as the IPv6 transition today.

p_l 5 days ago | parent | next [-]

When 32 bits were chosen it was because it was deemed a temporary thing for an experimental protocol, so there was no need to invest into proposed 128bit addressing by IIRC Vint Cerf (or 160 bit addresses of ITU/ISO protocols).

After all, we were supposed to switch off IPv4 in 1990...

Ekaros 6 days ago | parent | prev | next [-]

Might have ended up with 27-bit. If you do not really expect personal computer usage and just want to essentially make some proof of concept of interoperability which you will later upgrade or replace.

Maybe there would have been push to change at some point as there would have been real limits in place.

windward 5 days ago | parent | prev | next [-]

>we'd have hit exhaustion just a bit before the big telecom boom that built out most of the internet infrastructure that holds back transition today

I think this does go both ways. It's hard to care about 3058, but it's nice that we started trying to solve y2k and 2038 while they were still merely painful. Wouldn't want a loop leading to a divide-by-zero in my warp drive.

cchance 6 days ago | parent | prev | next [-]

why not 10 bit bytes and 40 bit addresses and nice 2 based metric based measures :)

fc417fc802 6 days ago | parent [-]

If something is painful you aren't doing it often enough, right? So my (completely uninformed) idea would be 27 bit addresses that are only routable on the local LAN and then a second optional 27 bit address to route between LANs on the WAN. The effective 54 bit address space would have been more than large enough, and if you support modularly extending addresses like that then there's no reason not to keep going beyond the initial 2 by eating into the payload.

Being completely uninformed I have no idea how severe the negative consequences of this scheme would be for the efficiency of routing hardware but I assume it would probably be catastrophic for some reason or another.

MindSpunk 6 days ago | parent [-]

That's very loosely how IPv6 works. Your ISP will typically assign your router a prefix and will route any address starting with that 56 or 64 bit prefix to you. Then devices on your network pick the remaining bits and they get their full address.

fc417fc802 6 days ago | parent [-]

Well IPv4 also used to work that way back before address exhaustion. What I'm describing isn't an arbitrary allocation of a subset of a single fixed bit width address but rather two (or more) entirely disjoint address spaces.

mrheosuper 6 days ago | parent | prev [-]

Nothing NAT can't solve /s.

oasisbob 6 days ago | parent | prev | next [-]

The IPv4 networking case is especially weird to think about because the early internet didn't use classless-addressing before CIDR.

Thinking about the number of bits in the address is only one of the design parameters. The partitioning between network masks and host space is another design decision. The decision to reserve class D and class E space yet another. More room for hosts is good. More networks in the routing table is not.

Okay, so if v4 addresses were composed of four 9-bit bytes instead of four 8-bit octets, how would the early classful networks shaken out? It doesn't do a lot of good if a class C network is still defined by the last byte.

JdeBP 6 days ago | parent | prev | next [-]

LLM dren also isn't going to provide anything on how wildly different the home computer revolution would have been with twice as big character ROMs; the personal computer revolution would have been with twice as big code pages 437, 850, and 1252 and an extra CGA attribute bit; the BBS era would have been with 9N1 telecommunications; ECMA-48 and ECMA-35 would have been with space for the C1 control characters with no need for alternative forms; ASCII and EBCDIC would have been without need for the national variants and room for some accented characters; and even how different the 6502 instruction set would have been.

With so many huge changes like those the alternate history by today would be far diverged from this universe.

The knock-on effect of EBCDIC having room for accented characters would have been the U.S.A. not changing a lot of placenames when the federal government made the GNIS in the 1970s and 1980s, for example. MS-DOS might have ended up with a 255-character command-tail limit, meaning that possibly some historically important people would never have been motivated to learn the response file form of the Microsoft LINK command. People would not have hit a 256-character limit on path lengths on DOS+Windows.

Teletext would never have needed national variants, would have had different graphics, would have needed a higher bitrate, might have lasted longer, and people in the U.K. would have possibly never seen that dog on 4-Tel. Octal would have been more convenient than hexadecimal, and a lot of hexadecimal programming puns would never have been made. C-style programming languages might have had more punctuation to use for operators.

Ð or Ç could have been MS-DOS drive letters. Microsoft could have spelled its name with other characters, and we could all be today reminiscing about µs-dos. The ZX Spectrum could have been more like the Oric. The FAT12 filesystem format would never have happened. dBase 2 files would have had bigger fields. People could have put more things on their PATHs in DOS, and some historically important person would perhaps have never needed to learn how to write .BAT files and gone on to a career in computing.

The Domain Name System would have had a significantly different history, with longer label limits, more characters, and possibly case sensitivity if non-English letters with quirky capitalization rules had been common in SBCS in 1981. EDNS0 might never have happened or been wildly different. RGB 5-6-5 encoding would never have happened; and "true colour" might have ended up as a 12-12-12 format with nothing to spare for an alpha channel. 81-bit or 72-bit IEEE 754 floating point might have happened.

"Multimedia" and "Internet" keyboards would not have bumped up against a limit of 127 key scancodes, and there are a couple of luminaries known for explaining the gynmastics of PS/2 scancodes who would have not had to devote so much of their time to that, and possibly might not have ended up as luminaries at all. Bugs in several famous pieces of software that occurred after 49.7 days would have either occurred much sooner or much later.

Actual intelligence is needed for this sort of science fiction alternative history construction.

topspin 5 days ago | parent | next [-]

Guess we should count our blessings that 7-bit bytes didn't become the de facto standard. Given that 7 bits is sufficient for ASCII and BCD, and the popularity of the IBM 1401 in the 1960's, that's not at all implausible. The alternate history might have had only 2^28 (268,435,456) unique IP4s. The cynic in me wants you to be sure to include the inevitable "We'd be better of with 10-bit bytes" headline in the 9-bit alternate history.

I've always taken it as a given that we ended up with 8-bits bytes because its the smallest power-of-two number of bits that accommodates ASCII and packed BCD. Back in the day, BCD mattered rather a lot. x86 has legacy BCD instructions, for example.

pavpanchekha 6 days ago | parent | prev | next [-]

Author here. Really great comment; I've linked it from the OP. (Could do without the insults!) Most of the changes you point out sound... good? Maybe having fewer arbitrary limits would have sapped a few historically significant coders of their rage against the machine, but maybe it would have pulled in a few more people by being less annoying in general. On colors, I did mention that in the post but losing an alpha channel would be painful.

zejn 6 days ago | parent [-]

Your first paragraph implies 8-bit bytes are a coincidence, which is not true. It was a design decision.

A byte in computer is the smallest addressable memory location and this location at that time contained a character. The way characters are encoded is called code. Early computers used 5-bits, which was not enough for alphabetics and numerals, 6 bits was not enough to encode numbers and lower and upper case characters, which eventually lead to ASCII.

ASCII was also designed(!) to make some operations simple, eg. turning text to upper or lower case only meant setting or clearing one bit if the code point was in a given range. This made some text operations much simpler and more performant, that is why pretty much everybody adopted ASCII.

Doing 7-bit ASCII operations with a 6-bit bytes is almost impossible and doing them with 18-bit words is wasteful.

When IBM was deciding on byte size a number of other options were considered, but the most advantageous was the 8-bit byte. Note that already with 8-bit bytes, this was over-provisioning space for character code, as ASCII was 7-bit. The extra bit offered quite some space for extra characters, which gave rise to character encodings. This isn't something I would expect a person living in the USA to know about, but users of other languages used upper 128 bytes for local and language specific characters.

When going with 8-bit byte, they also made the bytes individually addressable, making 32-bit integers actually 4 8-bit bytes. 8-bit byte also allowed to pack two BCD in one byte and you were able to get them out with a relatively simple operation.

Even though 8-bits was more than needed, they were deemed cost effective and "reasonably economical of storage space". And being a power of two allowed addressing a bit in a cost effective way, if a programmer needed to do so.

I think your post discounts and underestimates the amount of performance gain and cost optimisations 8-bit byte gave us at the time it mattered most, at the time computing power was low, and the fact that 8-bit bytes were just "good enough" and we didn't get anything usable from 9, 10, 12, 14 or 16 bit bytes.

On the other hand you overestimate the gains with imaginary problems, such as IPv4, which didn't even exist in 1960s (yes, we ran out of public space quite some time ago, no, not really a problem, even on pure IPv6 one has 6to4 NAT), or negative unix time - how on Earth did you get the idea that someone would use negative unix time stamps to represent historic datings, when most of the time we can't even be sure what year it was?

I think the most scary thing is having and odd-bit bytes; there would be a lot more people raging against the machine, if byte was 9 bits.

If you want to know why 8 bits, this is a good recap - https://jvns.ca/blog/2023/03/06/possible-reasons-8-bit-bytes... - along with the link to the book from the engineers who designed 8-bit bytes.

dan-robertson 6 days ago | parent [-]

Given how many early computers had 18-bit or 36-bit words, the possibility of 9-bit bytes doesn’t seem as unrealistic as you suggest. I don’t see 8-bit bytes as being so inevitable.

zejn 5 days ago | parent [-]

8-bit bytes were not inevitable, but they are definitely not natural and they were designed. Clearly this design outcompeted alternatives.

Dylan16807 6 days ago | parent | prev | next [-]

> The knock-on effect of EBCDIC having room for accented characters would have been the U.S.A. not changing a lot of placenames when the federal government made the GNIS in the 1970s and 1980s, for example.

I don't know about that, it had room for lots of accented characters with code pages. If that went unused, it probably would have also gone unused in the 9 bit version.

> Actual intelligence is needed for this sort of science fiction alternative history construction.

Why? We're basically making a trivia quiz, that benefits memorization far more than intelligence. And you actively don't want to get into the weeds of chaos-theory consequences or you forget the article you're writing.

p_l 5 days ago | parent [-]

You don't want to switch code pages while processing the data unless you add extra fields to indicate code page, ISO 2022 style (or in fact old baudot shifts style)

Dylan16807 5 days ago | parent [-]

Wouldn't the government department use the same code page at all times?

p_l 5 days ago | parent [-]

In EBCDIC world, not exactly, but all the place names being in one codepage is literally a return to why the accented names disappeared :)

Dylan16807 5 days ago | parent [-]

I need you to explain your argument better.

If you were saying they lost accents outside the main 50 or whatever, I'd understand why 8 bits were a problem. But you're saying they lost accents as a general rule, right? Why did they lose accents that were right there on the US code pages? Why would that reason not extend to a 9 bit semi-universal EBCDIC?

p_l 5 days ago | parent [-]

I read the original mention as trying to claim that it could have been solved by allowance for multiple codepages.

But for processing data in one common database, especially back then, you wanted to keep to single variation - main reason for using a different codepage if you didn't work in language other than english was to use APL (later, special variant of US codepage was added to support writing C, which for hysterical raisins wasn't exactly nice to work with in US default EBCDIC codepage).

So there would not be an allowance for multiple codepages if only because codepage identifier could cut into 72 characters left on punched card after including sort numbers

andai 5 days ago | parent | prev | next [-]

That's an interesting argument about convenience discouraging interaction with the system. If everything just works, there's no need to tinker. If you stop tinkering, the world might miss out on some real magic.

deafpolygon 6 days ago | parent | prev | next [-]

I'm curious what would have happened with gaming.. would we have gotten the NES?

windward 5 days ago | parent | prev [-]

I didn't recognise the word 'dren'. Asked an LLM.

marcosdumay 6 days ago | parent | prev | next [-]

Well, there should be half as many cases of multiples-of-9-bits ran out than for multiples-of-8-bits.

I don't think this is enough of a reason, though.

foxglacier 6 days ago | parent [-]

If you're deciding between using 8 bits or 16 bits, you might pick 16 because 8 is too small. But making the same decision between 9 and 18 bits could lead to picking 9 because it's good enough at the time. So no I don't think there would be half as many cases. They'd be different cases.

cdaringe 6 days ago | parent | prev [-]

There is certainly a well known bias or fallacy that describes this