Remix.run Logo
UniverseHacker 3 days ago

ECC is not easy to explain, and sounds like a tautology rather than an explanation "error correction is done with error correction"- unless you give a full technical explanation of exactly what ECC is doing.

marcellus23 3 days ago | parent [-]

Regardless of whether the parent's sentence is a tautology, the explanation in the article is categorically wrong.

bawolff 3 days ago | parent | next [-]

Categorically might be a bit much. Duplicating bits with majority voting is an error correction code, its just not a very efficient one.

Like its wrong, but its not like its totally out of this world wrong. Or more speciglficly its in the correct category.

vlovich123 3 days ago | parent [-]

It's categorically wrong to say that that's how memory is error corrected in classical computers because it is not and never has been how it was done. Even for systems like S3 that replicate, there's no error correction happening in the replicas and the replicas are eventually converted to erasure codes.

bawolff 3 days ago | parent [-]

I'm being a bit pedantic here, but it is not categorically wrong. Categorically wrong doesn't just mean "very wrong" it is a specific type of being wrong, a type that this isn't.

Repetition codes are a type of error correction code. It is thus in the category of error correction codes. Even if it is not the right error correction codes, it is in the correct category, so it is not a categorical error.

Dylan16807 3 days ago | parent | next [-]

I interpret that sentence as taking about real computers, which does put it outside the category.

bawolff a day ago | parent [-]

That's the definition of a normal error not a category error.

If you disagree, what do you see as something that would be in the correct category but wrong in the sentence?

The normal definition of category error is something that is so wrong it doesn't make sense on a deep level. Like for example if they suggested quicksort as an error correction code.

The mere fact we are talking about "real" computers should be a tip off its not a category error, since people can build new computers. Category errors are wron a priori. Its possible someone tomorrow will build a computer using a repetition code for error correcting. It is not possible they will use quicksort for ECC. Repetition codes is in the right category of things even if it is the wrong instance. Quicksort is not in the right category.

Dylan16807 a day ago | parent [-]

> The normal definition of category error is something that is so wrong it doesn't make sense on a deep level.

Can you show me a definition that says that about the phrase "categorically wrong"?

And I think the idea that computers could change is a bit weak.

cycomanic 3 days ago | parent | prev [-]

Well it's about as categorically wrong as saying quantum computers use similar error correction algorithms as classical computers. Categorically both are are error correction algorithms.

vlovich123 3 days ago | parent | prev | next [-]

Yeah, I couldn't quite remember if ECC is just hamming codes or is using something more modern like fountain codes although those are technically FEC. So in the absence of stating something incorrectly I went with the tautology.

cortesoft 3 days ago | parent | prev [-]

Eh, I don’t think it is categorically wrong… ECCs are based on the idea of sacrificing some capacity by adding redundant bits that can be used to correct for some number of errors. The simplest ECC would be just duplicating the data, and it isn’t categorically different than real ECCs used.

vlovich123 3 days ago | parent [-]

Then you're replicating and not error correcting. I've not seen any replication systems that use the replicas to detect errors. Even RAID 1 which is a pure mirroring solution only fetches one of the copies when reading & will ignore corruption on one of the disks unless you initiate a manual verification. There are technical reasons why that is related to read amplification as well as what it does to your storage cost.

cortesoft 3 days ago | parent [-]

I guess that is true, pure replication would not allow you to correct errors, only detect them.

However, I think explaining the concept as duplicating some data isn’t horrible wrong for non technical people. It is close enough to allow the person to understand the concept.

vlovich123 3 days ago | parent [-]

To be clear. A hypothetical replication system with 3 copies could be used to correct errors using majority voting.

However, there's no replication system I've ever seen (memory, local storage, or distributed storage) that detects or corrects for errors using replication because of the read amplification problem.

bawolff 3 days ago | parent [-]

https://en.wikipedia.org/wiki/Triple_modular_redundancy

vlovich123 2 days ago | parent [-]

The ECC memory page has the same non sensical statement:

> Error-correcting memory controllers traditionally use Hamming codes, although some use triple modular redundancy (TMR). The latter is preferred because its hardware is faster than that of Hamming error correction scheme.[16] Space satellite systems often use TMR,[17][18][19] although satellite RAM usually uses Hamming error correction.[20]

So it makes it seem like TMR is used for memory only to then back off and say it’s not used for it. ECC RAM does not use TMR and I suggest that the Wikipedia page is wrong and confused about this. The cited links on both pages are either dead or are completely unrelated, discussing TMR within the context of fpgas being sent into space. And yes, TMR is a fault tolerance strategy for logic gates and compute more generally. It is not a strategy that has been employed for storage full stop and evidence to the contrary is going to require something stronger than confusing wording on Wikipedia.