| ▲ | alexfoo 4 hours ago | |
> This stat is also complete bullshit. If it were true, your scrubs of any 20+TB pool would get at least corrected errors quite frequently. But this is not the case. I would expect the ZFS code is written with the expected BER in mind. If it reads something, computes the checksum and goes "uh oh" then it will probably first re-read the block/sector, see that the result is different, possibly re-read it a third time and if all OK continue on without even bothering to log an obvious BER related error. I would expect it only bothers to log or warn about something when it repeatedly reads the same data that breaks the checksum. Caveat Reddit but https://www.reddit.com/r/zfs/comments/3gpkm9/statistics_on_r... has some useful info in it. The OP starts off with a similar premise that a BER of 10^-14 is rubbish but then people in charge of very large pools of drives wade in with real world experience to give more context. | ||
| ▲ | digiown 30 minutes ago | parent [-] | |
That's some very old data. I'm curious as to how stuff have changed with all the new advancements like helium drives, HAMR, etc. From the stats Backblaze helpfully publish, I feel like the huge amount of variance between models far outweigh the importance of this specific stat in terms of considering failure risks. I also thought that it's "URE", i.e. unrecoverable with all the correction mechanisms. I'm aware that drives use various ways to protect against bitrot internally. | ||