Remix.run Logo
userbinator a day ago

If you want to keep software working on systems with a 9-bit byte or other weirdness, that's entirely on you. No one else needs or wants the extra complexity. Little endian is logical and won, big endian is backwards and lost for good reason. (Look at how arbitrary precision arithmetic is implemented on a BE system; chances are it's effectively LE anyway.)

ozgrakkurt a day ago | parent | next [-]

You don't need the "Little endian is logical" part.

Most people just don't care and can't be bothered to spend time making sure code is "endian portable".

Couldn't care less if it is easier to "read in crash dumps" TBH.

I don't even write server code to be portable to anything other than x86_64 or lately just use avx512 without any fallback since it is not needed in practice.

I'm not doing anything people care about probably but I imagine it is a similar feeling for people that do.

I would rather have small software that compiles fast than to add 50 #ifdef into it and make it burn my eyes, and spend time thinking "but would this work in big endian"

yjftsjthsd-h a day ago | parent | prev | next [-]

> Little endian is logical and won, big endian is backwards and lost for good reason.

No, BE is logical, but LE is efficient (for machines).

adrian_b a day ago | parent | next [-]

BE is not logical in any way, it is just a tradition, like the use of decimal numbers.

The use of automatic computers has forced a transition from the use of arbitrary conventions that did not have any logical motivation to the most efficient methods of data representation, like binary numbers in little-endian format.

Little-endian is more efficient even when you compute by pen on paper, if it feels awkward that is just because you were taught differently as a child.

There are special circumstances when a big-endian representation is the right choice, e.g. when you interpret a bit string as a binary polynomial, in order to implement an error-detection code with a CRC. However, for general-purpose numbers, little-endian is the optimum choice.

lowbloodsugar a day ago | parent | prev [-]

No, BE is intuitive for humans who write digits with the highest power on the left.

LE is logical which is also why it is more efficient and more intuitive for humans once they get past “how we write numbers with a pencil”.

yjftsjthsd-h a day ago | parent | next [-]

No, BE is logical because it puts bits and bytes in the same order. That humans use BE is also nice but secondary to that. I don't have strong feelings about whether fifty-one thousand nine hundred sixty-six is written as 0xcafe or 0xefac, but I feel quite comfortable suggesting that 0xfeca is absurd. (FWIW, this is a weak argument for what computers should do; if LE is more efficient for machines then let them use it)

Edit: switched example to hex

Edit2: actually this is still slightly out of whack, but I don't feel like switching to binary so take it as a loose representation rather than literal

thinking_cactus a day ago | parent | next [-]

My contribution: largest-order-first (big endian) makes sense in real life because people tend to make quick judgements in unreliable situations. For example, take the announcement that you're receiving $132551 dollars. You wouldn't want to hear something like "Hello! You have been awarded one and fifty and five hundred and... and one hundred thousand dollars!", you want to hear "You have been awarded One hundred and thirty two thousand and ... dollars!" The largest sums change decisions dramatically so it makes sense they come first.

On computers however, we basically always use exact arithmetic and exact, fixed logic where learning the higher order doesn't help (we're not doing approximations and decisions based on incomplete information), in fact for mathematical reasons in the exact cases it's usually better to compute and utilize the lowest bits first (e.g. in the case of sums and multiplication algos I am familiar with). [note1]

Overall I'm slightly surprised some automatic/universal translation methods for the most common languages haven't been made, although I guess there may be some significant difficulties or impossibilities (for example, if you send a bunch of bits/bytes outside, there's no general way to predict the endianess it should be in). I suspect LLMs will make this task much easier (without a more traditional universal translation algorithm).

[note1] Also, the time required to receive all bits from say a 64b number as opposed to the first k bits tends to be a negligible or even 0 difference, in both human terms (receiving data over a network) and machine terms (receiving data over a bus; optimizing an algorithm that uses numbers in complicated ways; etc.), again different from human communication and thought.

Joker_vD a day ago | parent | next [-]

> My contribution: largest-order-first (big endian) makes sense in real life because people tend to make quick judgements in unreliable situations. For example, take the announcement that you're receiving $132551 dollars. You wouldn't want to hear something like "Hello! You have been awarded one and fifty and five hundred and... and one hundred thousand dollars!", you want to hear "You have been awarded One hundred and thirty two thousand and ... dollars!" The largest sums change decisions dramatically so it makes sense they come first.

And yet in Arabic, the numbers are written in order from the least to the most significant digit, even if they are not really pronounced that way, starting from the numbers in the hundreds and up: "1234" is read as essentially "one thousand two hundred four-and-thirty", the same way the German does it. And yes, the order looks like it's the same as in e.g. English, but Arabic is written right to left. So, no, it's absolutely fine to write numbers in little endian even in the language that pronounces it the big-endian or even the mixed-endian way.

20 hours ago | parent [-]
[deleted]
Veserv a day ago | parent | prev [-]

There are plenty of ways for language to be better now that we know far more about arithmetic than when number words were created.

"One Five Five Two Three One" is 6 words, 6 syllables long where as "One Hundred and Thirty Two Thousand" is 6 words, 9 syllables long and conveys less information. Even shortening it to "One Hundred Thirty Two Thousand" is still 5 words, 8 syllables long and conveys less information.

You can also easily convey high order digits first by using a unambiguous "and/add" construction: "Thousand Two Three One Add One Five Five". You have now conveyed the three high order digits in 5 words, 5 syllables. You also convey the full number in 9 words, 9 syllables in contrast to "One Hundred Thirty Two Thousand One Hundred Fifty Five" which is 9 words, 14 syllables.

You could go even further and express things in pseudo-scientific notation which would be even more general and close to as efficient. "Zero E Three (10^3) Two Three One" which is 6 words, 6 syllables, but no longer requires unique separator words like "Thousand", "Million", "Billion", etc. This shows even greater efficiency if you are conveying "One Hundred Thirty Thousand" which would be something more like "Zero E Four (10^4) Three One" since the scientific notation digit position description is highly uniform.

This distinction might seem somewhat arbitrary since this just seems like it is changing the order for the sake of things. However, the advantage of little-endian description is that it is non-contextual. When you say the number "One" it literally always means the one's place "One". If you wish to speak of a different positional "One" you would prefix it with the position e.g. "Zero E Three (10^3) One". In contrast, in the normal way of speaking numbers "One" could mean any positional one. Are you saying "One Hundred", "One Thousand", "One Hundred Million"? You need to wait for subsequent words to know what "One" is being said. Transcription must fundamentally buffer a significant fraction of the word stream to disambiguate.

It also results in the hilariously duplicative "One Hundred Thirty Two Thousand One Hundred Fifty Five" which has positional signifiers for basically every word. "One Hundred Thir-ty Thousand One Hundred Fif-ty Five”. Fully 8 of the 14 syllables are used for positional disambiguation to reduce necessary lookahead. "And/Add" constructions get you that for a fraction of the word and syllable count. They allow arbitrary chunking since you can separate digit streams on any boundary. It also reinforces the fact that numbers are just composites of their components which may help with numeracy.

Little endian is actually just better in every respect, expect for compatibility and familiarity, if we use our modern robust knowledge of arithmetic to formulate the grammar rules.

dataflow a day ago | parent | prev | next [-]

> No, BE is logical because it puts bits and bytes in the same order.

This sounds confused. The "order" of bits is only an artifact of our human notation, not some inherent order. If you look at how an integer is implemented in hardware (say in a register or in combinational logic), you're not going to find the bits being reversed every byte.

yjftsjthsd-h a day ago | parent [-]

Okay, if you get everyone to write bits the other way I'll endorse LE as intuitive/logical. Until then, I want my bits and bytes notated uniformly.

dataflow a day ago | parent | next [-]

> Okay, if you get everyone to write bits the other way I'll endorse LE as intuitive/logical.

You're still confused, unfortunately. (Note: In everything that follows, I'm just pretending "Arabic numerals" came from Arabic. The actual history is more complicated but irrelevant to my point, so let's go with that.)

First, you're confusing intuitive with logical. They are not the same thing. e.g, survivorship bias (look up the whole WWII plane thing) is unintuitive, but extremely logical.

Second, even arguing intuitiveness here doesn't really make sense, because the direction of writing numerals is itself intrinsically arbitrary. If our writing system was such that a million dollars was written as "000,000,1$", suddenly you wouldn't find big-endian any more intuitive.

In fact, if you were an Arabic speaker and your computer was in Arabic (right to left) rather than English (left to right), then your hex editor would display right-to-left on the screen, and you would already find little-endian intuitive!

In other words, the only reason you find this unintuitive is that you speak English, which is (by unfortunate historical luck) written in "big-endian" form! Note that this has nothing to do with being right-to-left but left-to-right, but rather with whether the place values increase or decrease in the same direction as the prose. In Arabic, place values increase in the direction of the prose, which makes little-endian entirely intuitive to an Arabic speaker!

To put it another way, arguing LE is unintuitive is like claiming something being right-handed is somehow more intuitive than left-handed. If that's true, it's because you're used to being right-handed, not because right-handedness itself is somehow genuinely more intuitive. (And neither of these has anything to do with one being more or less logical than the other.)

userbinator a day ago | parent | prev [-]

Until then, I want my bits and bytes notated uniformly.

AFAIK it was only IBM whose CPUs were consistently BE for both bit and byte order (i.e. bit 0 is also the most significant bit.) Every other CPU which is BE for bytes is still LE for bits (bit 0 least significant.)

zephen a day ago | parent | prev [-]

Your example is only for dumping memory.

> this is a weak argument for what computers should do; if LE is more efficient for machines then let them use it

Computers really don't care. Literally. Same number of gates either way. But for everything besides dumping it makes sense that the least significant byte and the least significant bit are numbered starting from zero. It makes intuitive mathematical sense.

userbinator a day ago | parent | next [-]

Same number of gates either way

Definitely not, which is why many 8-bit CPUs are LE. Carries propagate upwards, and incrementers are cheaper than a length-dependent subtraction.

zephen 15 hours ago | parent [-]

So, to be clear, I was writing about when you design a computer. It truly is the same number of gates either way. I have written my fair share of verilog. At one level, it's just a convention.

For the use of a computer, yes, if you are doing multi-word arithmetic, it can matter.

OTOH, to be perfectly fair and balanced, multi-word comparisons work better in big-endian.

yjftsjthsd-h a day ago | parent | prev | next [-]

Not only dumping, but yes I agree it only matters when humans are in the loop. My most annoying encounters with endianness was when writing and debugging assembly, and I assure you dumping memory was not the only pain point.

zephen 15 hours ago | parent [-]

I've done plenty of assembly language. It was the bulk of my career for over 20 years, and little endian was just fine, and big endian was not.

Joker_vD a day ago | parent | prev [-]

> Computers really don't care. Literally. Same number of gates either way.

Eh. That depends; the computer architectures used to be way weirder than what we have today. IBM 1401 used variable-length BCDs (written in big-endian); its version of BCDIC literally used numbers from 1 to 9 as digits "1" to "9" (number 0 was blank/space, and number 10 would print as "0"). So its ADD etc. instructions took pointers to the last digits of numbers added, and worked backwards; in fact, pretty much all of indexing on that machine moved backwards: MOV also worked from higher addresses down to lower ones, and so on.

zephen a day ago | parent | prev [-]

> BE is intuitive for humans who write digits with the highest power on the left.

But only because when they dump memory, they start with the lowest address, lol.

Why don't these people reverse numberlines and cartesian coordinate systems while they're at it?

an-honest-moose 20 hours ago | parent [-]

A lot of graphics APIs do actually reverse the y-coordinate for historical reasons.

zephen 15 hours ago | parent [-]

Right. I've done plenty of postscript/PDF.

But 99% of the time the x-coordinate and the number increment from left to right.

pezezin a day ago | parent | prev [-]

LE is not "logical", it won because the IBM PC compatible won, simple as that.