Remix.run Logo
vlovich123 2 days ago

RAID-1 does not do on the fly error detection or correction. When you do a read you read from one of the disks with a copy but don't validate. You can probably initiate an explicit recovery if you suspect there's an error but that's not automatic. RAID is meant to protect against the entire disk failing but you just blindly assume the non-failing disk is completely error free. FWIW no formal RAID level I'm aware of does majority voting. Any error detection/correction is implemented through parity bits with all the problems that parity bits entail unless you use erasure code versions of RAID 6.

The reason things work this way is you'd have 2x read amplification on the bus for error detection and 3x read amplification on the bus for majority-voting error correction & something in the read I/O hot path validating the data reducing latency further. Additionally, RAID-1 is 1:1 mirroring so it can't do error correction automatically at all because it doesn't know which copy is the error-free. At best it can transparently handle errors when the disk refuses to service the request but it cannot handle corrupt data errors that the disk doesn't notice. If you do FDE then you probably would notice corruption at least and be able to reliably correct even with just RAID-1 but I'm not sure if anyone leverages this.

RAID-1 and other backup / duplication strategies are for durability and availability but importantly not for error correction. Error correction for durable storage is typically handled by modern techniques based on erasure codes while memory typically uses Hamming codes because they were the first ones, are cheaper to implement, and match better to RAM needs than Reed Solomon codes. Raptor codes are more recent but patents are owned by Qualcomm; some have expired but there are continuation patents that might cover it.