Remix.run Logo
agapon 11 hours ago

Generally, it's possible to have data which is not corrupted but which is logically inconsistent (incorrect).

Imagine that a directory ZAP has an entry that points to a bogus object ID. That would be an example. The ZAP block is intact but its content is inconsistent.

Such things can only happen through a logical bug in ZFS itself, not through some external force. But bugs do happen.

If your search through OpenZFS bugs you will find multiple instances. Things like leaking objects or space, etc. That's why zdb now has support for some consistency checking (bit not for repairs).

mustache_kimono 10 hours ago | parent [-]

> Imagine that a directory ZAP has an entry that points to a bogus object ID. That would be an example. The ZAP block is intact but its content is inconsistent.

The above is interesting and fair enough, but a few points:

First, I'm not sure that makes what seems to be the parent's point -- that scrub is an inadequate replacement for an fsck.

Second, I'm really unsure if your case is the situation the parent is referring to. Parent seems to be indicating actual data loss is occurring. Not leaking objects or space or bogus object IDs. Parent seems to be saying she/he scrubs with no errors and then when she/he tries to read back a file, oops, ZFS can't.

rincebrain 3 hours ago | parent [-]

The two obvious examples that come to mind are native encryption bugs and spacemap issues.

Nothing about walking the entire tree of blocks and checking hashes validates the spacemaps - they only come up when you're dealing with allocating new blocks, and there have been a number of bugs where ZFS panics because the spacemaps say something insane, so you wind up needing to readonly import or discard the ZIL because it panics about trying to allocate an already-allocated segment if you import RW - and if your ondisk spacemaps are inconsistent in a way that discarding the ZIL doesn't work around, you would need some additional tool to try and repair this, because ZFS has no knobs for it.

Native encryption issues wouldn't be noticed because scrubbing doesn't attempt to untransform data blocks - you indirectly do that when you're walking the structures involved, but the L0 data blocks don't get decompressed or decrypted, since all your hashes are of the transformed blocks. And if you have a block where the hash in the metadata is correct but it doesn't decrypt, for any reason, scrub won't notice, but you sure will if you ever try to decrypt it.

mustache_kimono 2 hours ago | parent [-]

> The two obvious examples

Appreciate this rincebrain. Know that you know better than most and this certainly covers my 2nd point. I don't imagine these cases cover my first point though? These are not bugs of the type a fsck would catch?