Remix.run Logo
pjdesno 2 days ago

I had a computer architecture prof (a reasonably accomplished one, too) who thought that all CS units should be binary, e.g. Gigabit Ethernet should be 931Mbit/s, not 1000MBit/s.

I disagreed strongly - I think X-per-second should be decimal, to correspond to Hertz. But for quantity, binary seems better. (modern CS papers tend to use MiB, GiB etc. as abbreviations for the binary units)

Fun fact - for a long time consumer SSDs had roughly 7.37% over-provisioning, because that's what you get when you put X GB (binary) of raw flash into a box, and advertise it as X GB (decimal) of usable storage. (probably a bit less, as a few blocks of the X binary GB of flash would probably be DOA) With TLC, QLC, and SLC-mode caching in modern drives the numbers aren't as simple anymore, though.

thayne 2 days ago | parent | next [-]

It makes it inconvenient to do things like estimate how long it will take to transfer a 10GiB file. Both because of the the difference between G and Gi, and because one is in bytes and the other is in bits.

There are probably cases where corresponding to Hz, is useful, but for most users I think 119MiB/s is more useful than 1Gbit/s.

ralferoo 2 days ago | parent | prev | next [-]

There's a good reason that gigabit ethernet is 1000MBit/s and that's because it was defined in decimal from the start. We had 1MBit/s, then 10MBit/s, then 100MBit/s then 1000MBit/s and now 10Gbit/s.

Interestingly, from 10GBit/s, we now also have binary divisions, so 5GBit/s and 2.5GBit/s.

Even at slower speeds, these were traditionally always decimal based - we call it 50bps, 100bps, 150bps, 300bps, 1200bps, 2400bps, 9600bps, 19200bps and then we had the odd one out - 56k (actually 57600bps) where the k means 1024 (approximately), and the first and last common speed to use base 2 kilo. Once you get into MBps it's back to decimal.

icedchai 2 days ago | parent | next [-]

To add further confusion, 57600 was actually a serial port speed, from the computer to the modem, which was higher than the maximum physical line (modem) speed. Many people ran higher serial port speeds to take advantage of compression (115200 was common.)

56000 BPS was the bitrate you could get out of a DS0 channel, which is the digital version of a normal phone line. A DS0 is actually 64000 BPS, but 1 bit out of 8 is "robbed" for overhead/signalling. An analog phone lined got sampled to 56000 BPS, but lines were very noisy, which was fine for voice, but not data.

7 bits per sample * 8000 samples per second = 56000, not 57600. That was theoretical maximum bandwidth! The FCC also capped modems at 53K or something, so you couldn't even get 56000, not even on a good day.

krater23 2 days ago | parent | prev | next [-]

This has nothing to do with the 1024, it has todo with the 1200 and the multiples of it and the 14k and 28k modems where everyone just cut off the last some hundred bytes because you never reached that speed anyway.

jiveturkey 2 days ago | parent | prev [-]

> that's because it was defined in decimal from the start

I mean, that's not quite it. By that logic, had memory been defined in decimal from the start (happenstance), we'd have 4000 byte pages.

Now ethernet is interesting ... the data rates are defined in decimal, but almost everything else about it is octets! Starting with the preamble. But the payload is up to an annoying 1500 (decimal) octets. The _minimum_ frame length is defined for CSMA/CD to work, but the max could have been anything.

soneil 2 days ago | parent | prev | next [-]

This is the bit (sic) that drives me nuts.

RAM had binary sizing for perfectly practical reasons. Nothing else did (until SSDs inherited RAM's architecture).

We apply it to all the wrong things mostly because the first home computers had nothing but RAM, so binary sizing was the only explanation that was ever needed. And 50 years later we're sticking to that story.

gpm 2 days ago | parent | next [-]

RAM having binary sizing is a perfectly good reason for hard drives having binary sized sectors (more efficient swap, memory maps, etc), which in turn justifies all of hard disks being sized in binary.

wvenable 2 days ago | parent | prev | next [-]

Literally every number in a computer is base-2, not just RAM addressing. Everything is ulimately bits, pins, and wires. The physical and logical interface between your oddly sized disk and your computer? Also some base-2.

fc417fc802 2 days ago | parent | next [-]

Even the disk sectors are in base 2. It's only the marketing that's in base 10.

hmry 2 days ago | parent | prev [-]

Not everything is made from wires and transistors. And that's why these things are usually not measured in powers of 2:

- magnetic media

- optical media

- radio waves

- time

There's good reasons for having power-of-2 sectors (they need to get loaded into RAM), but there's really no compelling reason to have a power-of-2 number of sectors. If you can fit 397 sectors, only putting in 256 is wasteful.

wvenable 2 days ago | parent [-]

Since everything ultimately ends up inside a base-2 computer across base-2 bus that even if these media aren't subject to the same considerations it still makes sense to measure them that way.

The choice would be effectively arbitrary, the number of actual bits or bytes is the same regardless of the multiplier that you use. But since it's for a computer, it makes sense to use units that are comparable (e.g. RAM and HD).

fc417fc802 2 days ago | parent [-]

Buses and networking fit best with base 10 bits (not bytes) per second for reasons that are hopefully obvious. But I agree with you that everything else naturally lends itself to base 2.

krater23 2 days ago | parent | prev [-]

Nope. The first home computers like the C64 had RAM and sectors on disc, which in case of the C64 means 256 bytes. And there it is again, the smaller base of 1024.

Just later, some marketing assholes thought they could better sell their hard drives when they lie about the size and weasel out of legal issues with redefining the units.

wtallis 2 days ago | parent | prev | next [-]

NAND flash has overprovisioning even on a per-die basis, eg. Micron's 256Gbit first-generation 3D NAND had 548 blocks per plane instead of 512, and the pages were 16384+2208 bytes. That left space both for defects and ECC while still being able to provide at least the nominal capacity (in power of two units) with good yield, but meant the true number of memory cells was more than 20% higher than implied by the nominal capacity.

The decimal-vs-binary discrepancy is used more as slack space to cope with the inconvenience of having to erase whole 16MB blocks at a time while allowing the host to send write commands as small as 512 bytes. Given the limited number of program/erase cycles that any flash memory cell can withstand, and the enormous performance penalty that would result from doing 16MB read-modify-write cycles for any smaller host writes, you need way more spare area than just a small multiple of the erase block size. A small portion of the spare area is also necessary to store the logical to physical address mappings, typically on the order of 1GB per 1TB when tracking allocations at 4kB granularity.

wmf 2 days ago | parent | prev | next [-]

An even bigger problem is that networks are measured in bits while RAM and storage are in bytes. I'm sure this leads to plenty of confusion when people see a 120 meg download on their 1 gig network.

(The old excuse was that networks are serial but they haven't been serial for decades.)

bombcar 2 days ago | parent | prev | next [-]

Wirespeeds and bitrate and baud and all that stuff is vastly confusing when you start looking into it - because it's hard to even define what a "bit on the wire" is when everything has to be encoded in such a way that it can be decoded (specialized protocols can go FASTER than normal ones on the same wire and the same mechanism if they can guarantee certain things - like never having four zero bits in a row).

waffletower 2 days ago | parent | prev [-]

I can see a precision argument for binary represented frequencies. A systems programmer would value this. A musician would not.

fsckboy 2 days ago | parent | next [-]

musicians use numbering systems that are actually far more confused than anything discussed here. how many notes in an OCTave? "do re mi fa so la ti do" is eight, but that last do is part of the next octave, so an OCTave is 7 notes. (if we count transitions, same thing, starting at the first zero do, re is 1, ... again 7.

the same and even more confusion is engendered when talking about "fifths" etc.

waffletower 2 days ago | parent | next [-]

The 7 note scale you suggest (do re mi fa so la ti do) is comprised of different intervals (2 2 1 2 2 2 1) in the 12-fold equal tempered scale. There are infinite ways of exploring an octave in music, but unfortunately listener demand for such exploration is near infinitesimal.

fsckboy 2 days ago | parent [-]

don't you mean 11-fold? ... oh wait, they aren't even consistent

waffletower 2 days ago | parent [-]

They sum to 12

fsckboy 2 days ago | parent [-]

actually they multiply, 12th root of 2, to the 12th

jltsiren 2 days ago | parent | prev [-]

You can blame the Romans for that, as they practiced inclusive counting. Their market days occurring once every 8 days were called nundinae, because the next market day was the ninth day from the previous one. (And by the same logic, Jesus rose from the dead on the third day.)

AlotOfReading 2 days ago | parent | prev [-]

Musicians often use equal temperament, so they have their own numerical crimes to answer for.

waffletower 2 days ago | parent [-]

Touché, appropriate to describe near compulsory equal temperament (ala MIDI) as a crime.