Remix.run Logo
dietr1ch 2 days ago

I don't think of base 10 being meaningful in binary computers. Indexing 1k needs 10 bits regardless if you wanted 1000 or 1024, and the base 10 leaves some awkward holes.

In my mind base 10 only became relevant when disk drive manufacturers came up with disks with "weird" disk sizes (maybe they needed to reserve some space for internals, or it's just that the disk platters didn't like powers of two) and realised that a base 10 system gave them better looking marketing numbers. Who wants a 2.9TB drive when you can get a 3TB* drive for the same price?

userbinator 2 days ago | parent | next [-]

At the TB level, the difference is closer to 10%.

Three binary terabytes i.e. 3 * 2^40 is 3298534883328, or 298534883328 more bytes than 3 decimal terabytes. The latter is 298.5 decimal gigabytes, or 278 binary gigabytes.

Indeed, early hard drives had slightly more than even the binary size --- the famous 10MB IBM disk, for example, had 10653696 bytes, which was 167936 bytes more than 10MB --- more than an entire 160KB floppy's worth of data.

mananaysiempre 2 days ago | parent | prev | next [-]

Buy an SSD, and you can get both at the same time!

That is to say, all the (high-end/“gamer”) consumer SSDs that I’ve checked use 10% overprovisioning and achieve that by exposing a given number of binary TB of physical flash (e.g. a “2TB” SSD will have 2×1024⁴ bytes’ worth of flash chips) as the same number of decimal TB of logical addresses (e.g. that same SSD will appear to the OS as 2×1000⁴ bytes of storage space). And this makes sense: you want a round number on your sticker to make the marketing people happy, you aren’t going to make non-binary-sized chips, and 10% overprovisioning is OK-ish (in reality, probably too low, but consumers don’t shop based on the endurance metrics even if they should).

jdsully a day ago | parent | next [-]

"consumers don’t shop based on the endurance metrics even if they should"

Its been well over a decade now and neither I nor anyone I know has ever had an SSD endurance issue. So it seems like the type of problem where you should just go enterprise if you have it.

userbinator 2 days ago | parent | prev [-]

you aren’t going to make non-binary-sized chips

TLC flash actually has a total number of bits that's a multiple of 3, but it and QLC are so unreliable that there's a significant amount of extra bits used for error correction and such.

SSDs haven't been real binary sizes since the early days of SLC flash which didn't need more than basic ECC. (I have an old 16MB USB drive, which actually has a user-accessible capacity of 16,777,216 bytes. The NAND flash itself actually stores 17,301,504 bytes.)

fc417fc802 2 days ago | parent | prev | next [-]

> I don't think of base 10 being meaningful in binary computers.

They communicate via the network, right? And telephony has always been in base 10 bits as opposed to base two eight bit bytes IIUC. So these two schemes have always been in tension.

So at some point the Ki, Mi, etc prefixes were introduced along with b vs B suffixes and that solved the issue 3+ decades ago so why is this on the HN front page?!

A better question might be, why do we privilege the 8 bit byte? Shouldn't KiB officially have a subscript 8 on the end?

purplehat_ 2 days ago | parent [-]

To be fair, the octet as the byte has been dominant for decades. POSIX even has the definition “A byte is composed of a contiguous sequence of 8 bits.” I would wager many software engineers don't even know that a non-octet bytes were a thing, given that college CS curricula typically just teach a byte is 8 bits.

I found some search results about Texas Instruments' digital signal processors using 16-bit bytes, and came across this blogpost from 2017 talking about implementing 16-bit bytes in LLVM: https://embecosm.com/2017/04/18/non-8-bit-char-support-in-cl.... Not sure if they actually implemented it, but that was surprising to me that non octet bytes still exist, albeit in a very limited manner.

Do you know of any other uses for bytes that are not 8 bits?

zinekeller a day ago | parent | next [-]

> Do you know of any other uses for bytes that are not 8 bits?

For "bytes" as the term-of-art itself? Probably not. For "codes" or "words"? 5 bits are the standard in Baudot transmission (in teletype though). 6- and 7-bit words were the standards of the day for very old computers (ASCII is in itself a 7-bit code), especially on DEC-produced ones (https://rabbit.eng.miami.edu/info/decchars.html).

ahazred8ta a day ago | parent | prev | next [-]

Back in the days of Octal notation, there were computers with a 12 bit word size that used sixbit characters (early DEC PDP-8, PDP-5, early CDC machines). 'Byte' was sometimes used for 6- and 9-bit halfword values.

fc417fc802 a day ago | parent | prev [-]

I wanted to reply with a bunch of DSP examples but on further investigation the ones I checked just now seem to very deliberately use the term "data word". That said, the C char type in these cases is one "data word" as opposed to 8 bits; I feel like that ought to count as a non-8-bit byte regardless of the terminology in the docs.

NXP makes a number of audio DSPs with a native 24 bit width.

Microchip still ships chips in the PIC family with instructions of various widths including 12 and 14 bit however I believe the data memory on those chips is either 8 or 16 bit. I have no idea how to classify a machine where the instruction and data memory widths don't match.

Unlike POSIX, C merely requires that char be at least 8 bits wide. Although I assume lots of real world code would break if challenged on that particular detail.

thfuran 2 days ago | parent | prev | next [-]

>I don't think of base 10 being meaningful in binary computers.

Okay, but what do you mean by “10”?

2 days ago | parent | next [-]
[deleted]
dietr1ch 2 days ago | parent | prev [-]

10, not to be confused with 10 or even the weird cousin, 10

jibal a day ago | parent | prev [-]

> I don't think of base 10 being meaningful in binary computers.

First, you implicitly assumed a decimal number base in your comment.

Second: Of course its meaningful. It's also relevant since humans use binary computers and numeric input and output in text is almost always in decimal.