▲ | cmurf 4 hours ago | |
> Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state. Pretty sure all file systems and their developers are unsurprised by file system corruption occurring on bad hardware. There are also drives that report successful flush and fua, but the expected (meta)data is not yet on stable media. That results in out of order writes. There's no consequence unless there's a badly timed crash or power failure. In that case there's out of order writes and possibly dropped writes (what was left in the write cache). File system developers have told me that their designs do not account for drives miscommunicating flush/fua succeeding when it hasn't. This is like operating under nobarrier some of the time. Overwriting file systems' metadata have fixed locations, therefore quite a lot of assumptions can be made during repair about what should be there, inferring it from metadata in other locations. Btrfs has no fixed locations for metadata. This leads to unique flexibility, and repair difficulty. Flexible: Being able to convert between different block group profiles (single, dup, and all the raids), and run on unequal sized drives, and conversion from any file system anybody wants to write the code for - because only the per device super blocks have fixed locations. Everything else can be written anywhere else. But the repair utility can't make many assumptions. And if the story told by the metadata that is present, isn't consistent, the repair necessarily must fail. With Btrfs the first step is read-only rescue mount, which uses backup roots to find a valid root tree, and also the ability to ignore damaged trees. This read-only mount is often enough to extract important data that hasn't been (recently) backed up. Since moving to Btrfs by default in Fedora almost 10 releases ago, we haven't seen more file system problems. One problem we do see more often is evidence of memory bitflips. This makes some sense because the file system metadata isn't nearly as big a target as data. And since both metadata and data are checksummed, Btrfs is more likely to detect such issues. | ||
▲ | phire 3 hours ago | parent [-] | |
To be clear, I'm not expecting btrfs (or any filesystem) to avoid corrupt itself on unreliable hardware. I'm not expecting it to magically avoid unavoidable data loss. All I want is an fsck that I can trust. I love that btrfs will actually alert me to bad hardware. But then I expect to be able to replace the hardware and run fsck (or scrub, or whatever) and get back to the best-case healthy state with minimal fuss. And by "healthy" I don't mean ready for me to extract data from, I mean ready for me to mount and continue using. In my case, I had zero corrupted metadata, and a second copy of all data. fsck/scrub should have been able to fix everything with zero interaction. If files/metadata are corrupted, fsck/scrub should provide tooling for how to deal with them. Delete them? Restore them anyway? Manual intervention? IMO, failure is not a valid option. |