| ▲ | DeepSeaTortoise 14 hours ago | ||||||||||||||||
IMO it's exactly the right layer, just like for ECC memory. There's a lot of potential for errors when the storage controller processes and turns the data into analog magic to transmit it. In practice, this is a solved problem, but only until someone makes a mistake, then there will be a lot of trouble debugging it between the manufacturer certainly denying their mistake and people getting caught up on the usual suspects. Doing all the ECC stuff right on the CPU gives you all the benefits against bitrot and resilience against all errors in transmission for free. And if all things go just right we might even be getting better instruction support for ECC stuff. That'd be a nice bonus | |||||||||||||||||
| ▲ | lxgr 14 hours ago | parent | next [-] | ||||||||||||||||
> There's a lot of potential for errors when the storage controller processes and turns the data into analog magic to transmit it. That's a physical layer, and as such should obviously have end-to-end ECC appropriate to the task. But the error distribution shape is probably very different from that of bytes in NAND data at rest, which is different from that of DRAM and PCI again. For the same reason, IP does not do error correction, but rather relies on lower layers to present error-free datagram semantics to it: Ethernet, Wi-Fi, and (managed-spectrum) 5G all have dramatically different properties that higher layers have no business worrying about. And sticking with that example, once it becomes TCP's job to handle packet loss due to transmission errors (instead of just congestion), things go south pretty quickly. | |||||||||||||||||
| |||||||||||||||||
| ▲ | johncolanduoni 12 hours ago | parent | prev [-] | ||||||||||||||||
ECC memory modules don’t do their own very complicated remapping from linear addresses to physical blocks like SSDs do. ECC memory is also oriented toward fixing transient errors, not persistently bad physical blocks. | |||||||||||||||||