▲ | koverstreet 4 days ago | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1 - you absolutely can and should walk reverse mappings in the filesystem so that from a corrupt block you can tell the user which file was corrupted. In the future bcachefs will be rolling out auxiliary dirent indices for a variety of purposes, and one of those will be to give you a list of files that have had errors detected by e.g. scrub (we already generally tell you the affected filename in error messages) 2 - No, metadata robustness absolutely varies across filesystems. From what I've seen, ext4 and bcachefs are the gold standard here; both can recover from basically arbitrary corruption and have no single points of failure. Other filesystems do have single points of failure (notably btree roots), and btrfs and I believe ZFS are painfully vulnerable to devices with broken flush handling. You can blame (and should) blame the device and the shitty manufacturers, but from the perspective of a filesystem developer, we should be able to cope with that without losing the entire filesystem. XFS is quite a bit better than btrfs, and I believe ZFS, because they have a ton of ways to reconstruct from redundant metadata if they lose a btree root, but it's still possible to lose the entire filesystem if you're very, very unlucky. On a modern filesystem that uses b-trees, you really need a way of repairing from lost b-tree roots if you want your filesystem to be bulletproof. btrfs has 'dup' mode, but that doesn't mean much on SSDs given that you have no control over whether your replicas get written to the same erase unit. Reiserfs actually had the right idea - btree node scan, and reconstruct your interior nodes if necessary. But they gave that approach a bad name; for a long time it was a crutch for a buggy b-tree implementation, and they didn't seed a filesystem specific UUID into the btree node magic number like bcachefs does, so it could famously merge a filesystem from a disk image with the host filesystem. bcachefs got that part right, and also has per-device bitmaps in the superblock for 'this range of the device has btree nodes' so it's actually practical even if you've got a massive filesystem on spinning rust - and it was introduced long after the b-tree implementation was widely deployed and bulletproof. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | magicalhippo 4 days ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> XFS is quite a bit better than btrfs, and I believe ZFS, because they have a ton of ways to reconstruct from redundant metadata if they lose a btree root As I understand it ZFS also has a lot of redundant metatdata (copies=3 on anything important), and also previous uberblocks[1]. In what way is XFS better? Genuine question, not really familiar with XFS. [1]: https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSMetadata... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | 4 days ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
[deleted] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | ajross 20 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 2 - No, metadata robustness absolutely varies across filesystems. That's misunderstanding the subthread. The upthread point was about metadata atomicity in snapshots, not hardware corruption recovery. A filesystem like ZFS can make sure the journal is checkpointed atomically with the CoW snapshot moment, where dm obviously can't. And I pointed out this wasn't actually helpful because this is a problem that has to be solved above the filesystem, in databases and apps, because it's isomorphic to power loss (something that the filesystem can't prevent). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|