> XFS is quite a bit better than btrfs, and I believe ZFS, because they have a ton of ways to reconstruct from redundant metadata if they lose a btree root

As I understand it ZFS also has a lot of redundant metatdata (copies=3 on anything important), and also previous uberblocks[1].

In what way is XFS better? Genuine question, not really familiar with XFS.

[1]: https://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSMetadata...

▲

koverstreet 4 days ago | parent [-]

I can't speak with any authority on ZFS, I know its structure the least out of all the major filesystems.

I do a ton of reading through forums gathering user input, and lots of people chime in with stories of lost filesystems. I've seen reports of lost filesystems with ZFS and I want to say I've seen them at around the same frequency of XFS; both are very rare.

My concern with ZFS is that they seem to have taken the same "no traditional fsck" approach as btrfs, favoring entirely online repair. That's obviously where we all want to be, but that's very hard to get right, and it's been my experience that if you prioritize that too much you miss the "disaster recovery" scenarios, and that seems to be what's happened with ZFS; I've read that if your ZFS filesystem is toast you need to send it to a data recovery service.

That's not something I would consider acceptable, fsck ought to be able to do anything a data recovery service would do, and for bcachefs it does.

I know the XFS folks have put a ton of outright paranoia into repair, including full on disaster recovery scenarios. It can't repair in scenarios where bcachefs can - but on the other hand, XFS has tricks that bcachefs doesn't, so I can't call bcachefs unequivocally better; we'd need to wait for more widespread usage and a lot more data.

▲

p_l 3 days ago | parent [-]

The lack of traditional 'fsck' is because its operation would be exact same as normal driver operation. The most extreme case involves a very obscure option that lets you explicitly rewind transactions to one you specify, which I've seen used to recover a broken driver upgrade that led to filesystem corruption in ways that most FSCK just barf on, including XFS'

For low-level meddling and recovery, there's a filesystem debugger that understands all parts of ZFS and can help for example identifying previous uberblock that is uncorrupted, or recovering specific data, etc.

▲

koverstreet 3 days ago | parent [-]

Rewinding transactions is cool. Bcachefs has that too :)

What happens on ZFS if you lose all your alloc info? Or are there other single points of failure besides the ublock in the on disk format?

▲

magicalhippo 3 days ago | parent [-]

> What happens on ZFS if you lose all your alloc info?

According to this[1] old issue, it hasn't happened frequently enough to prioritize implementing a rebuild option, however one should be able to import the pool read-only and zfs send it to a different pool.

As far as I can tell that's status quo. I agree it is something that should be implemented at some point.

That said, certain other spacemap errors might be recoverable[2].

[1]: https://github.com/openzfs/zfs/issues/3210

[2]: https://github.com/openzfs/zfs/issues/13483#issuecomment-120...

▲

koverstreet 3 days ago | parent [-]

I take a harder line on repair than the ZFS devs, then :)

If I see an issue that causes a filesystem to become unavailable _once_, I'll write the repair code.

Experience has taught me that there's a good chance I'll be glad I did, and I like the peace of mind that I get from that.

And it hasn't been that bad to keep up on, thanks to lucky design decisions. Since bcachefs started out as bcache, with no persistent alloc info, we've always had the ability to fully rebuild alloc info, and that's probably the biggest and hardest one to get right.

You can metaphorically light your filesystem on fire with bcachefs, and it'll repair. It'll work with whatever is still there and get you a working filesystem again with the minimum possible data loss.

▲

magicalhippo 3 days ago | parent [-]

As I said I do think ZFS is great, but there are aspects where it's quite noticeable it was born in an enterprise setting. That sending, recreating and restoring the pool is a sufficient disaster recovery plan to not warrant significant development is one of those aspects.

As I mentioned in the other subthread, I do think your commitment to help your users is very commendable.

	▲	koverstreet 3 days ago \| parent [-]
		Oh, I'm not trying to diss ZFS at all. You and I are in complete agreement, and ZFS makes complete sense in multi device setups with real redundancy and non garbage hardware - which is what it was designed for, after all. Just trying to give honest assessments and comparisons.