Remix.run Logo
jasonwatkinspdx 4 days ago

Not an expert but I happened to read about some of the history of this a while back.

ASCII has its roots in teletype codes, which were a development from telegraph codes like Morse.

Morse code is variable length, so this made automatic telegraph machines or teletypes awkward to implement. The solution was the 5 bit Baudot code. Using a fixed length code simplified the devices. Operators could type Baudot code using one hand on a 5 key keyboard. Part of the code's design was to minimize operator fatigue.

Baudot code is why we refer to the symbol rate of modems and the like in Baud btw.

Anyhow, the next change came with instead of telegraph machines directly signaling on the wire, instead a typewriter was used to create a punched tape of codepoints, which would be loaded into the telegraph machine for transmission. Since the keyboard was now decoupled from the wire code, there was more flexibility to add additional code points. This is where stuff like "Carriage Return" and "Line Feed" originate. This got standardized by Western Union and internationally.

By the time we get to ASCII, teleprinters are common, and the early computer industry adopted punched cards pervasively as an input format. And they initially did the straightforward thing of just using the telegraph codes. But then someone at IBM came up with a new scheme that would be faster when using punch cards in sorting machines. And that became ASCII eventually.

So zooming out here the story is that we started with binary codes, then adopted new schemes as technology developed. All this happened long before the digital computing world settled on 8 bit bytes as a convention. ASCII as bytes is just a practical compromise between the older teletype codes and the newer convention.

pcthrowaway 4 days ago | parent [-]

> But then someone at IBM came up with a new scheme that would be faster when using punch cards in sorting machines. And that became ASCII eventually.

Technically, the punch card processing technology was patented by inventor Herman Hollerith in 1884, and the company he founded wouldn't become IBM until 40 years later (though it was folded with 3 other companies into the Computing-Tabulating-Recording company in 1911, which would then become IBM in 1924).

To be honest though, I'm not clear how ASCII came from anything used by the punch card sorting machines, since it wasn't proposed until 1961 (by an IBM engineer, but 32 years after Hollerith's death). Do you know where I can read more about the progression here?

jasonwatkinspdx 4 days ago | parent | next [-]

It's right there in the history section of the wiki page: https://en.wikipedia.org/wiki/ASCII#History

> Work on the ASCII standard began in May 1961, when IBM engineer Bob Bemer submitted a proposal to the American Standards Association's (ASA) (now the American National Standards Institute or ANSI) X3.2 subcommittee.[7] The first edition of the standard was published in 1963,[8] contemporaneously with the introduction of the Teletype Model 33. It later underwent a major revision in 1967,[9][10] and several further revisions until 1986.[11] In contrast to earlier telegraph codes such as Baudot, ASCII was ordered for more convenient collation (especially alphabetical sorting of lists), and added controls for devices other than teleprinters.[11]

Beyond that I think you'd have to dig up the old technical reports.

zokier 4 days ago | parent | prev [-]

IBM also notably used EBCDIC instead of ASCII for most of their systems

timsneath 4 days ago | parent | next [-]

And just for fun, they also support what must be the most weird encoding system -- UTF-EBCDIC (https://www.ibm.com/docs/en/i/7.5.0?topic=unicode-utf-ebcdic).

kstrauser 4 days ago | parent [-]

Post that stuff with a content warning, would you?

> The base EBCDIC characters and control characters in UTF-EBCDIC are the same single byte codepoint as EBCDIC CCSID 1047 while all other characters are represented by multiple bytes where each byte is not one of the invariant EBCDIC characters. Therefore, legacy applications could simply ignore codepoints that are not recognized.

Dear god.

necovek 4 days ago | parent [-]

That says roughly the following when applied to UTF-8:

"The base ASCII characters and control characters in UTF-8 are the same single byte codepoint as ISO-8859-1 while all other characters are represented by multiple bytes where each byte is not one of the invariant ASCII characters. Therefore, legacy applications could simply ignore codepoints that are not recognized."

(I know nothing of EBCDIC, but this seems to mirror UTF-8 design)

stmpjmpr 4 days ago | parent | prev [-]

*EBCDIC

zokier 4 days ago | parent [-]

Thanks, fixed