Remix.run Logo
phire 12 hours ago

> Corrupt my drive twice, can't corrupt my drive again.

Exact same drive? You might want to check that drive isn't silently corrupting data.

I still blame btrfs, something very similar happened to me.

I had a WD Green drive with a known flaw were it would just silently zero data on writes in some random situations. EXT4 worked fine on this drives for years (the filesystem was fine, my files had random zeroed sections). But btrfs just couldn't handle this situation and immediately got itself into an unrecoverable state, scrub and fsck just couldn't fix the issue.

In one way, I was better off. At least I now knew that drive had been silently corrupting data for years. But it destroyed my confidence in btrfs forever. Btrfs didn't actually lose any additional data for me, it was in RAID and the data was all still there, so it should have been able to recover itself.

But it simply couldn't. I had to manually use a hex editor to piece a few files back together (and restore many others from backup).

Even worse, when I talked to people on the #btrfs IRC channel, not only was nobody was surprised the btrfs had borked itself due to bad hardware, but everyone recommend that a btrfs filesystem that had been borked could never be trusted. Instead, the only way to get a trustworthy, clean, and canonical btrfs filesystem was to delete it and start from scratch (this time without the stupid faulty drive)

Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state.

bigstrat2003 6 hours ago | parent | next [-]

I too have had data loss from BTRFS. Had a RAID-1 array where one of the drives started flaking out, sometimes it would disappear when rebooting the system. Unfortunately, before I could replace the drive, one time when booting my array had been corrupted and it was unrecoverable (or at least it was unrecoverable with my skill level). This wasn't a long time ago either, this was within the last 2-3 years. When I got the new drive and rebuilt the array, I used ZFS and it has been rock solid.

pmarreck 7 hours ago | parent | prev | next [-]

I wrote a tool to try to attack this specific problem (subtle, random drive corruption) in the general sense https://github.com/pmarreck/bitrot_guard but it requires re-running it for any modified files, which makes it mainly only suitable for long-term archival purposes. I'm not sure why one of these filesystems doesn't just invisibly include/update some par2 or other parity data so you at least get some unexpected corruption protection/insurance (plus notification when things are starting to go awry)

cmurf 4 hours ago | parent | prev [-]

> Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state.

Pretty sure all file systems and their developers are unsurprised by file system corruption occurring on bad hardware.

There are also drives that report successful flush and fua, but the expected (meta)data is not yet on stable media. That results in out of order writes. There's no consequence unless there's a badly timed crash or power failure. In that case there's out of order writes and possibly dropped writes (what was left in the write cache).

File system developers have told me that their designs do not account for drives miscommunicating flush/fua succeeding when it hasn't. This is like operating under nobarrier some of the time.

Overwriting file systems' metadata have fixed locations, therefore quite a lot of assumptions can be made during repair about what should be there, inferring it from metadata in other locations.

Btrfs has no fixed locations for metadata. This leads to unique flexibility, and repair difficulty. Flexible: Being able to convert between different block group profiles (single, dup, and all the raids), and run on unequal sized drives, and conversion from any file system anybody wants to write the code for - because only the per device super blocks have fixed locations. Everything else can be written anywhere else. But the repair utility can't make many assumptions. And if the story told by the metadata that is present, isn't consistent, the repair necessarily must fail.

With Btrfs the first step is read-only rescue mount, which uses backup roots to find a valid root tree, and also the ability to ignore damaged trees. This read-only mount is often enough to extract important data that hasn't been (recently) backed up.

Since moving to Btrfs by default in Fedora almost 10 releases ago, we haven't seen more file system problems. One problem we do see more often is evidence of memory bitflips. This makes some sense because the file system metadata isn't nearly as big a target as data. And since both metadata and data are checksummed, Btrfs is more likely to detect such issues.

phire 4 hours ago | parent [-]

To be clear, I'm not expecting btrfs (or any filesystem) to avoid corrupt itself on unreliable hardware. I'm not expecting it to magically avoid unavoidable data loss.

All I want is an fsck that I can trust.

I love that btrfs will actually alert me to bad hardware. But then I expect to be able to replace the hardware and run fsck (or scrub, or whatever) and get back to the best-case healthy state with minimal fuss. And by "healthy" I don't mean ready for me to extract data from, I mean ready for me to mount and continue using.

In my case, I had zero corrupted metadata, and a second copy of all data. fsck/scrub should have been able to fix everything with zero interaction.

If files/metadata are corrupted, fsck/scrub should provide tooling for how to deal with them. Delete them? Restore them anyway? Manual intervention? IMO, failure is not a valid option.