Remix.run Logo
itchingsphynx 11 hours ago

>Most systems that include ZFS schedule scrubs once per month. This frequency is appropriate for many environments, but high churn systems may require more frequent scrubs.

Is there a more specific 'rule of thumb' for scrub frequency? What variables should one consider?

barrkel 4 hours ago | parent | next [-]

I scrub once a quarter because scrubs take 11 days to complete. I have 8x 18TB raidz2 pool, and I keep a couple of spare drives on hand so I can start a resilver as soon as an issue crops up.

In the past, I've gone for a few years between scrubs. One system had a marginal I/O setup and was unreliable for high streaming load. When copying the pool off of it, I had to throttle the I/O to keep it reliable. No data loss though.

Scrubs are intensive. They will IMO provoke failure in drives sooner than not doing them. But they're the kind of failures you want to bring forward if you can afford the replacements (and often the drives are under warranty anyway).

If you don't scrub, eventually you generally start seeing one of two things: delays in reads and writes because drive error recovery is reading and rereading to recover data; or, if you have that disk behaviour disabled via firmware flags (and you should, unless you're reslivering and on your last disk of redundancy), you see zfs kicking a drive out of the pool during normal operations.

If I start seeing unrecoverable errors, or a drive dropping out of the pool, I'll disable scrubs if I don't have a spare drive on hand to start mirroring straight away. But it's better to have the spares. At least two, because often a second drive shows weakness during resilver.

There is a specific failure mode that scrubs defend against: silent disk corruption that only shows up when you read a file, but for files you almost never read. This is a pretty rare occurrence - it's never happened to me in about 50 drives worth of pools over 15 years or so. The way I think about this is, how is it actionable? If it's not a failing disk, you need to check your backups. And thus your scrub interval should be tied to your backup retention.

kanbankaren 8 hours ago | parent | prev | next [-]

Once a month might be too high because HDDs are rated at ~ 180 TB workload/year. Remember, the workload/year limit includes read & writes and doesn't vary much by capacity, so a 10 TB HDD scrubbed monthly consumes 67% of the workload, let alone any other usage.

Scrubbing every quarter is usually sufficient without putting high wear on the HDD.

Hakkin 7 hours ago | parent [-]

A scrub only reads allocated space, so in your 10TB example, a scrub would only read whatever portion of that 10TB is actually occupied by data. It's also usually recommended to keep your usage below 80% of the total pool size to avoid performance issues, so the worst case in your scenario would be more like ~53% assuming you follow the 80% rule.

formerly_proven 6 hours ago | parent [-]

Is the 80% rule real or just passed down across decades like other “x% free” rules? Those waste enormous amounts of resources on modern systems and I kind of doubt ZFS actually needs a dozen terabytes or more of free space in order to not shit the bed. Just like Linux doesn’t actually need >100 GB of free memory to work properly.

magicalhippo 8 minutes ago | parent | next [-]

> Is the 80% rule real or just passed down across decades like other “x% free” rules?

As I understand it, the primary reason for the 80% was that you're getting close to another limit, which IIRC was around 90%, where the space allocator would switch from finding a nearby large-enough space to finding the best-fitting space. This second mode tanks performance and could lead to much more fragmentation. And since there's no defrag tool, you're stuck with that fragmentation.

It has also changed, now[1] the switch happens at 96% rather than 90%. Also the code has been improved[2] to better keep track of free space.

However, performance can start to degrade before you reach this algorithm switch[3], as you're more likely to generate fragmentation the less free space you have.

However, it was also a generic advice, which was ignorant to your specific workload. If you have a lot of cold data, low churn but it's fairly equal in size, then you're probably less affected than if you have high churn with lots of files of varied sizes.

[1]: https://openzfs.github.io/openzfs-docs/Performance%20and%20T...

[2]: https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSZpoolFra...

[3]: https://www.bsdcan.org/2016/schedule/attachments/366_ZFS%20A...

barrkel 4 hours ago | parent | prev | next [-]

In practice you see noticeable degradation of performance for streaming reads of large files written after 85% or so. Files you used to be able to expect to get 500+MB/sec could be down to 50MB/sec. It's fragmentation, and it's fairly scale invariant, in my experience.

cornonthecobra 4 hours ago | parent | prev [-]

Speaking strictly about ZFS internal operations, the free space requirement is closer to 5% on current ZFS versions. That allows for CoW and block reallocations in real-world pools. Heavy churn and very large files will increase that margin.

toast0 11 hours ago | parent | prev | next [-]

Once a month seems like a reasonable rule of thumb.

But you're balancing the cost of the scrub vs the benefit of learning about a problem as soon as possible.

A scrub does a lot of I/O and a fair amount of computing. The scrub load competes with your application load and depending on the size of your disk(s) and their read bandwidth, it may take quite some time to do the scrub. There's even maybe some potential that the read load could push a weak drive over the edge to failure.

On my personal servers, application load is nearly meaningless, so I do an about monthly scrub from cron which I think will only scrub one zpool at a time per machine, which seems reasonable enough to me. I run relatively large spinning disks, so if I scrubbed on a daily basis, the drives would spend most of the day scrubbing and that doesn't seem reasonable. I haven't run ZFS in a work environment... I'd have to really consider how the read load impacted the production load and if scrubbing with limits to reduce production impact would complete in a reasonable amount of time... I've run some systems that are essentially alwayd busy and if a scrub would take several months, I'd probably only scrub when other systems indicate a problem and I can take the machine out of rotation to examine it.

If I had very high reliability needs or a long time to get replacement drives, I might scrub more often?

If I was worried about power consumption, I might scrub less often (and also let my servers and drives go into standby). The article's recommendation to scan at least once every 4 months seems pretty reasonable, although if you have seriously offline disks, maybe once a year is more approachable. I don't think I'd push beyond that, lots of things don't like to sit for a year and then turn on correctly.

atmosx 10 hours ago | parent | prev | next [-]

Once a month is fine ("/etc/cron.monthly/zfs-scrub"):

    #!/bin/bash
    #
    # ZFS scrub script for monthly maintenance
    # Place in /etc/cron.monthly/zfs-scrub
    
    POOL="storage"
    TAG="zfs-scrub"
    
    # Log start
    logger -t "$TAG" -p user.notice "Starting ZFS scrub on pool: $POOL"
    
    # Run the scrub
    if /sbin/zpool scrub "$POOL"; then
        logger -t "$TAG" -p user.notice "ZFS scrub initiated successfully on pool: $POOL"
    else
        logger -t "$TAG" -p user.err "Failed to start ZFS scrub on pool: $POOL"
        exit 1
    fi
    
    exit 0
k_bx 8 hours ago | parent | next [-]

Didn't know about the logger script, looks nice. Can it wrap the launch of the scrub itself so that it logs like logger too, or do you separately track its stdout/stderr when something happens?

update: figured how you can improve that call to add logs to logger

nubinetwork 5 hours ago | parent [-]

Scrub doesn't log anything by default, you run it and it returns quickly... you have to get the results out of zpool status or through zed.

chungy 9 hours ago | parent | prev [-]

That script might do with the "-w" parameter passed to scrub. Then "zpool scrub" won't return until the scrub is finished.

ssl-3 11 hours ago | parent | prev | next [-]

The cost of a scrub is just a flurry of disk reads and a reduction in performance during a scrub.

If this cost is affordable on a daily basis, then do a scrub daily. If it's only affordable less often, then do it less often.

(Whatever the case: It's not like a scrub causes any harm to the hardware or the data. It can run as frequently as you elect to tolerate.)

agapon 8 hours ago | parent [-]

With HDDs, it's also mechanical wear and increased chance of a failure. SSDs are not fully immune to increased load either.

ssl-3 8 hours ago | parent [-]

Is there any evidence that suggests that reading from a hard drive (instead of it just spinning idle) increases physical wear in any meaningful way? Likewise, is there any evidence of this for solid-state storage?

rcxdude 6 hours ago | parent | next [-]

Yes. Hard drives have published "Annualized Workload Rate" ratings, which are in TB/year, and the manufacturers state there is no difference between reads and writes for the purpose of this rating.

(https://www.toshiba-storage.com/trends-technology/mttf-what-...)

For SSDs, writes matter a lot more. Reads may increase the temperature of the drive, so they'll have some effect, but I don't think I've seen a read endurance rating for an SSD.

digiown an hour ago | parent | prev [-]

Reading from it requires the read head to move, as opposed to spinning idle where the heads are parked on the side. Moving parts generally wear out over time.

nubinetwork 11 hours ago | parent | prev [-]

Total pool size and speed. Less data scrubs faster, as do faster disks or disk topology (a 3 way stripe of nvme will scrub faster than a single sata ssd)

For what its worth, I scrub daily mostly because I can. It's completely overkill, but if it only takes half an hour, then it can run in the middle of the night while I'm sleeping.