Remix.run Logo
throw0101c 5 hours ago

> I always thought that the reason zfs did its extensive CRC checks was primarily to detect data corruption while it was in RAM or over the network, with a side effect that in the rare cares that data on disk got corrupted without the drive detecting it because the CRC was still valid, it'd also be spotted.

Nope, it's always been about on-disk bit rot.

First off: drive firmware has been known to return the wrong LBA data. The OS asks for 123, the drive reads 234—and verifies its drive-level CRC, which passes—and sends it up. Application gets a bundle of bits that's not correct. With ZFS, it expects a certain checksum from that part of the tree/file, and so the LBA 234 gets returned it will not match the checksum that is for 123.

Next, if you have RAID-1, then if the drive has corrupted data, if you don't have higher-level FS checksums, how do you which mirror has the correct data? They're different, but which is correct. With ZFS you know which block has the correct checksum, return that data to application, and then use the correct data to correct the wrong one.

BuildTheRobots 13 minutes ago | parent [-]

I don't know how much better modern drives (and SSDs) have gotten[1], but as someone who started digital hoarding in the mid 90's, on-disk bitrot used to be a massive problem. The amount of my video, audio and pictures that suffered damage was palpable. ZFS offering to fix it was massive selling point and the time and based on personal experience, it delivered.

ZFS also lets you specify number of copies on a single disk. This sounds a bit weird, but as drives suffer block failures far more often than total failures, it's actually surprisingly useful in some situations.

[1] My suspicion is significantly, as storage sizes are now multiple orders of magnitude larger and errors per MB can't have scaled up linearly to match.