| ▲ | barrkel 4 hours ago | |
I scrub once a quarter because scrubs take 11 days to complete. I have 8x 18TB raidz2 pool, and I keep a couple of spare drives on hand so I can start a resilver as soon as an issue crops up. In the past, I've gone for a few years between scrubs. One system had a marginal I/O setup and was unreliable for high streaming load. When copying the pool off of it, I had to throttle the I/O to keep it reliable. No data loss though. Scrubs are intensive. They will IMO provoke failure in drives sooner than not doing them. But they're the kind of failures you want to bring forward if you can afford the replacements (and often the drives are under warranty anyway). If you don't scrub, eventually you generally start seeing one of two things: delays in reads and writes because drive error recovery is reading and rereading to recover data; or, if you have that disk behaviour disabled via firmware flags (and you should, unless you're reslivering and on your last disk of redundancy), you see zfs kicking a drive out of the pool during normal operations. If I start seeing unrecoverable errors, or a drive dropping out of the pool, I'll disable scrubs if I don't have a spare drive on hand to start mirroring straight away. But it's better to have the spares. At least two, because often a second drive shows weakness during resilver. There is a specific failure mode that scrubs defend against: silent disk corruption that only shows up when you read a file, but for files you almost never read. This is a pretty rare occurrence - it's never happened to me in about 50 drives worth of pools over 15 years or so. The way I think about this is, how is it actionable? If it's not a failing disk, you need to check your backups. And thus your scrub interval should be tied to your backup retention. | ||