Remix.run Logo
vbezhenar 16 hours ago

Please note that some filesystems, namely bcachefs, btrfs, zfs seem to be immune to this issue, probably because they don't just directly delegate writes to the block layer with O_DIRECT flag. But it is important to be aware of this issue.

saurik 16 hours ago | parent [-]

While those are all "filesystems", they are also (internally) alternatives to MD RAID; like, you could run zfs on top of MD RAID, but it feels like a waste of zfs (and the same largely goes for btrfs and bcachefs). It thereby is not at all clear to me that it is the filesystems that are "immune to this issue" rather than their respective RAID-like behaviors, as it seems to be the latter that the discussion was focussing on (hence the initial mention of potentially adding btrfs to the issue, which did not otherwise mention any filesystem at all). Put another way: if you did do the unusual thing of running zfs on top of MD RAID, I actually bet you are still vulnerable to this scenario.

(Oh, unless you are maybe talking about something orthogonal to the fixes mentioned in the discussion thread, such as some property of the extra checksumming done by these filesystems? And so, even if the disks de-synchronize, maybe zfs will detect an error if it reads "the wrong one" off of the underlying MD RAID, rather than ending up with the other content?)

ludocode 14 hours ago | parent | next [-]

These filesystems are not really alternatives because mdraid supports features those filesystems do not. For example, parity raid is still broken in btrfs (so it effectively does not support it), and last I checked zfs can't grow a parity raid array while mdraid can.

I run btrfs on top of mdraid in RAID6 so I can incrementally grow it while still having copy-on-write, checksums, snapshots, etc.

I hope that one day btrfs fixes its parity raid or bcachefs will become stable enough to fully replace mdraid. In the meantime I'll continue using mdraid with a copy-on-write filesystem on top.

bananapub 13 hours ago | parent | next [-]

> zfs can't grow a parity raid array while mdraid can.

indeed out of date - that was merged a long time ago and shipped in a stable version earlier this year.

koverstreet 8 hours ago | parent | prev [-]

soon :)

Polizeiposaune 14 hours ago | parent | prev [-]

ZFS puts checksums in the block pointer, so, unless you disable checksums, it always knows the expected checksum of a block it is about to read.

When the actual checksum of what was read from storage doesn't match the expected value, it will try reading alternate locations (if there are any), and it will write back the corrected block if it succeeds in reconstructing a block with the expected checksum.