Remix.run Logo
RedoxFS is the default filesystem of Redox OS, inspired by ZFS(doc.redox-os.org)
165 points by doener 12 hours ago | 111 comments
ants_everywhere 6 hours ago | parent | next [-]

I'm not a filesystem person, but this sets off similar red flags to rolling your own encryption.

Isn't writing a robust file system something that routinely takes on the order of decades? E.g. reiserfs, bcachefs, btrfs.

Not to rain on anyone's parade. The project looks cool. But if you're writing an OS, embarking on a custom ZFS-inspired file system seems like the ultimate yak shaving expedition.

jillesvangurp 2 hours ago | parent | next [-]

Sometimes doing things because they are hard is a great reason to do them to see if the reasons those things are hard are still valid. Doing a filesystem in Rust potentially mitigates some of those things. Most existing filesystems have gone through a lengthy stabilization phase where using them meant exposing yourself to nasty data corruption bugs, obscure race issues, and other issues that, when you root cause them, have a lot to do with the kinds of things Rust explicitly addresses (memory safety, safe concurrency, etc.). So there's a great argument to just try to leverage those features to make things easier and try to build an awesome file system.

Worst case this doesn't work. Best case, this works amazingly well. I think there's some valid reason for optimism here give other hard things that Rust has been used for in the past few years.

Galanwe 3 hours ago | parent | prev | next [-]

I see Redox as an incubator of new developments for a low level Rust ecosystem. It's not a production ready OS, its purpose is to spark new ideas, propose alternative implementations, try on new paths, etc. I see them implementing a ZFS variant as completely in-line with this objective.

There needs to be projects like that for any kind of innovation to happen.

smittywerben 3 hours ago | parent | prev | next [-]

I don't believe in the "never roll your own encryption" it's literally giving up. Does it make economic sense, or is it just for a hobby? That's more debatable. It's also like a foil of 'don't use regex to parse html' or whatever, where the thread gets closed for comments.

The filesystem is so deeply connected to the OS I bet there's a lot of horror around swapping those interfaces. On the contrary, I've never heard anything bad about DragonflyBSD's HAMMER. But it's basically assumed you're using DragonFlyBSD.

Would I keep a company's database on a new filesystem? No, nobody would know how to recover it from failed disk hardware.

This isn't really my area but a Rust OS using a ZFS-like filesystem seems like a lot of classic Linux maintainer triggers. What a funny little project this is. It's the first I've heard of Redox.

Edit: reminds me of The Tarpit chapter from the Mythical Man Month

> The fiercer the struggle, the more entangling the tar, and no beast is so strong or so skillful but that he ultimately sinks.

rmunn 3 hours ago | parent [-]

The "never create your own encryption" advice is specifically because crypto is full of subtle ways to get it wrong, which you will NOT catch on your own. It's a special case of "never use encryption that hasn't been poked at for years by hundreds of crypto specialists" — because any encryption you create yourself would fail that test.

Filesystems, as complex as they are, aren't full of traps like encryption is. Still plenty of subtle traps, don't get me wrong: you have to be prepared for all kinds of edge cases like the power failing at exactly the wrong moment, hardware going flaky and yet you have to somehow retrieve the data since it's probably the only copy of someone's TPS report, that sort of thing. But at least you don't have millions of highly-motivated people deliberately trying to break your filesystem, the way you would if you rolled your own encryption.

smittywerben an hour ago | parent [-]

That matches what I've heard, so I think you stated the trope perfectly. Your response is a good point about the actual difficulty. Perhaps I'm confused about what 'rolling your own encryption' means at an abstraction level. I just think it's weird that it comes up in an OS thread. Anyone who is serious about encryption is serious about the encryption hardware. At a higher level, WolfSSL limits the ciphers to a small, modern suite, which reduces the attack surface. Replacing OpenSSL is a fool's errand, I think; it's clearly the perfect implementation of OpenSSL, and it's a perfect security scapegoat. However, this is still about the x86 OS topic. Perhaps it's some TPM politics, similar to the decade-old stigma surrounding ZFS. Maybe I'm just questioning the limits of the x86 platform on any new operating system. Anyway, thanks for the response.

a-dub 4 hours ago | parent | prev | next [-]

i don't think it has to be all that robust yet as it mostly runs in vms (even though it may be!).

an internet community project to write an entire operating system from scratch using some newfangled programming language is literally the final boss of yak shaving. there is no reason to do it other than "it's fun" and of course writing a filesystem for it would be fun.

koverstreet 4 hours ago | parent [-]

Rust really is attractive to a filesystem developer. Over C, it brings generics for proper data structures, iterators (!), much better type safety, error handling - all the things Rust is good at are things you want.

For me, the things that would make it just perfect would be more ergonomic Cap'n Proto support (eliminate a ton of fiddly code for on disk data structures), and dependent types.

a-dub 3 hours ago | parent | next [-]

it remains an open question as to how reliable, performant and efficient a system built with these higher level constructs would compare to the highly optimized low level stuff you'd see in a mature linux filesystem project.

i suspect the linux stuff would be far more space and time efficient, but we won't know until projects like this mature more.

koverstreet 3 hours ago | parent | next [-]

Eh? That's not an open question at all anymore; Rust has a drastically lower defect rate than C and good Rust is every bit as fast as good C.

Now, the engineering effort required to rewrite or redo in Rust, that's a different story of course.

a-dub 3 hours ago | parent [-]

i'd be curious how many of the higher level features and libraries would be best avoided if attempting to match the performance and space efficiency of a filesystem implemented in purpose designed highly optimized c.

cyberax 2 hours ago | parent [-]

I'm rewriting some of my Arduino projects into Rust (using Embassy and embedded-hal).

It's _so_ _much_ _better_. I can use async, maps, iterators, typesafe deserialization, and so on. All while not using any dynamic allocations.

With full support from Cargo for repeatable builds. It's night and day compared to the regular Arduino landscape of random libraries that are written in bad pseudo-object-oriented C++.

IshKebab 3 hours ago | parent | prev [-]

Yeah it's only an open question if you have your eyes closed.

IshKebab 3 hours ago | parent | prev [-]

To be fair Cap'nProto's C++ API is hardly ergonomic.

koverstreet 2 hours ago | parent [-]

Doing it right needs lenses (from Swift)

madushan1000 2 hours ago | parent | prev | next [-]

Redox Os is a microkernel operating system, completely different from monolithic kernels like Linux or BSD. I doubt it'll be easy to get existing ZFS drivers working on it at all.

MangoToupe 5 hours ago | parent | prev [-]

Isn't brtfs itself just a ZFS-inspired filesystem? If that can manage to find a foothold, why can't this?

koverstreet 5 hours ago | parent [-]

The only thing btrfs took from ZFS was the featureset - COW, data checksumming, snapshots, multi device. ZFS was a much more conservative design, btrfs is based on COW b-trees (with significant downsides) and if you can put it in any lineage it would be Reiserfs.

cayleyh 11 hours ago | parent | prev | next [-]

"because of the monolithic nature of ZFS that created problems with the Redox microkernel design"

Anyone have an idea what this actually means and what problems they were having in practice?

AndrewDavis 8 hours ago | parent | next [-]

I can only speculate, but maybe they're referring to the same thing Andrew Morton meant when he described ZFS as a rampant layering violation.

ie ZFS isn't just a file system. It's a volume manager, raid and file system rolled into one holistic system vs for example LVM + MD + ext4.

And (again I'm only speculating) in their micro kernel design want to have individual components running separately to layer together a complete solution.

p_l 3 hours ago | parent [-]

It's only a rampant layering violation if you mandate the use of external layers like Linux device mapper as the only allowed way... Or you haven't actually read through the code and assume based on external user interface.

No, ZFS is not "monolithic".

It's just that on the outside you have a well integrated user interface that does not expose you to SPA (block layer), ZIO (IO layer, that one is a bit intersectional but still a component others call), DMU (object storage), and finally ZVOL (block storage emulated over DMU) and ZPL (POSIX-compatible filesystem on top of DMU) or Lustre-ZFS (Lustre metadata and object stores implented on top of DMU). There are also a few utility components that are effectively libraries (AVL trees, key-value data serialization library, etc)

panick21_ 3 hours ago | parent [-]

In the Linux world you need to be hard to use in order to prove how pure you are. Anything that is actually easy to use is always considered unpure and bad.

creshal 2 hours ago | parent [-]

Not sure why you're getting downvoted, considering how people torture themselves with calculating SSD cache sector offsets by hand so they can imitate 1% of ZFS's feature set with LVM2.

evanjrowley 11 hours ago | parent | prev | next [-]

Good question. I don't know about other microkernels, but NetBSD is a small kernel that supports ZFS. The support has been there since the 4.0.5 and 5.3[0], possibly earlier too. I'm not adept at navigating the mailing lists here, but I imagine a good place to learn about the challenges of porting ZFS to a smaller kernel would be the NetBSD and ZFS lists from that era (2008-2009). What NetBSD does today is use a 'zfs' modlue that depends on a 'solaris' kernel modile. The dependency of Solaris primitives is probably one of the major challenges with porting ZFS to any kernel. FWIW, somehow a ZFS port for the "hybrid" kernel in Windows also exists[1].

[0] https://vermaden.wordpress.com/2022/03/25/zfs-compatibility/

[1] https://github.com/openzfsonwindows/openzfs

adastra22 11 hours ago | parent [-]

NetBSD isn’t a microkernel.

pests 7 hours ago | parent [-]

Who is calling it a microkernel? The post youre replying to calls it a “small kernel” - that does not imply it’s a microkernel tho, right? I didn’t think size has anything to do with it.

Dylan16807 6 hours ago | parent [-]

I'm not sure if it originally said small kernel, though I know for sure the italics weren't originally there. The wording is unclear in a couple ways.

pests 6 hours ago | parent [-]

I came back to maybe delete my comments as I felt I might have came off harsh, esp before I saw the dead comment chain. No ill will, was confused as well I think.

StrangeDoctor 7 hours ago | parent | prev | next [-]

I don’t think it’s microkernels in general but their microkernel design which wants as much as possible in userspace. They want each component to have its own memory space. ZFS blurs the layers between filesystem and the volume management. This kinda bothers layers of abstraction model folks. And I assume combined with their posix like model it just sorta clashes with what they want to do. Not impossible to integrate, but they want something a little different.

aidenn0 10 hours ago | parent | prev | next [-]

That seems odd to me too. It seems like they could have put all of ZFS (and SPL) in a single system service.

yjftsjthsd-h 9 hours ago | parent [-]

I particularly don't buy it because ZFS used to have a FUSE build, and I'm pretty sure there's at least one company still running it in userspace in some form (something for k8s, IIRC?)

Neikius 3 hours ago | parent | prev | next [-]

The main selling point of ZFS is it being monolithic. Because of that it can optimize many things that are impossible in a layered approach.

jandrewrogers 3 hours ago | parent | prev [-]

If I had to guess, it is because ZFS likes to insert itself into things beyond just being a filesystem. It is one of the reasons ZFS notoriously works poorly with database engines, which have a tendency to bypass or ignore the filesystem (for good reason). It is a design choice on the part of ZFS.

panick21_ 3 hours ago | parent [-]

Oracle specifically came back to Sun because they had ZFS based servers. So that seems a bit strange to me.

mnw21cam 28 minutes ago | parent | prev | next [-]

> File/directory size limit up to 193TiB (212TB)

This would be a significant problem with my use case in the very near future. I already have double-digit-TB files, and that doesn't look like much margin on top of that.

scoopr 2 hours ago | parent | prev | next [-]

I've occasionally pondered, how feasible would it be to write a APFS implementation just from the specs[0] alone. Is it harder or easier to create the implementation when you have a provided layout and mechanism how it works. Would it be easy to keep compatibility, and would it be a dead-end design for extensions that you'd like?

[0] https://developer.apple.com/support/downloads/Apple-File-Sys...

pluto_modadic 6 hours ago | parent | prev | next [-]

The developer is also kind which makes this awesome.

ladyanita22 11 hours ago | parent | prev | next [-]

Redox is shaping up to be the most advanced OSS alternative to Linux apart from BSDs.

samtheDamned 10 hours ago | parent | next [-]

Yeah I've always written this off as a fun side project for a group of people but after seeing consistent updates and improvements over the last several years I've been so impressed by how far this project has been going.

edoceo 10 hours ago | parent [-]

I feel like I read that exact quote, 25+ years ago about Linux.

I admire these projects & the teams for their tenacity.

Four bells! Damn the torpedoes.

wraptile 4 hours ago | parent | prev | next [-]

I feel like MIT license will prevent this from ever becoming a linux alternative unless of course they switch to something more sane later on.

qalmakka 2 hours ago | parent | next [-]

Linux didn't win because it was GPL'd, it won because it was the only real alternative back in '92. The BSDs were all caught up in the moronic SCO lawsuits of the time, otherwise we'd all be using FreeBSD or some other 386BSD variant today instead of Linux. The GPL was a nice bonus but it isn't the real secret sauce that has powered Linux's growth, it was mostly good timing.

That doesn't mean that I'd rather see some form of copyleft in place (like the MPLv2) or at least a licence with some kind of patent protection baked in (like the Apache 2.0), the X11/MIT licences are extremely weak against patent trolls

bigstrat2003 4 hours ago | parent | prev [-]

There's nothing insane about MIT. It may not be your preference, but that's not the same as insane.

wraptile 3 hours ago | parent | next [-]

other licenses being more sane doesn't imply MIT is _insane_ per se. It's just not a very sane option for cooperation and has a very real posibility of driving someone insane. Imagine working on redoxos for years with your friends and then Microsoft takes your work, rebrands it as Windows 19, completely steals all of the market from you and silences you through legal pressure without even crediting your work. All of this is very much possible and similar scenarios have happened before.

MIT is for education not cooperation.

omnimus 3 hours ago | parent | prev [-]

I am not native speaker but saying something is more sane doesn't mean the person means/thinks other option is insane (which is the extreme on the scale).

It can mean both of the options might be sane (reasonable) one is just more reasonable. It might also mean both of the options are insane (unreasonable) one is just less so.

snvzz 11 hours ago | parent | prev | next [-]

You might not be aware of Genode[0].

0. https://genode.org/

Rochus 9 hours ago | parent [-]

Genode looks interesting. As far as I understand it uses the sel4 kernel? Is it really in development since 2008?

wucke13 23 minutes ago | parent [-]

It doesn't necessarily, but it can. Genode/SculptOS is kind of a microkernel OS framework, and it can use seL4 as the kernel.

Here is a talk about that porting effort:

https://m.youtube.com/watch?v=N624i4X1UDw

NewJazz 11 hours ago | parent | prev | next [-]

Fuchsia?

stevefan1999 9 hours ago | parent | next [-]

Fuchsia, or Zicron kernel to be specific, is pretty much dead since the last layoff of Google

laxd 9 hours ago | parent | next [-]

If it's dead, why is it moving so much? https://fuchsia.googlesource.com/fuchsia/+log

SV_BubbleTime 8 hours ago | parent [-]

As of writing this, last commit 45 seconds ago. On the other hand, if you scan the names, it’s like 5 of the same people.

I agree, can’t say “dead” but it is a Google project so it’s like being born with a terminal condition.

surajrmal 3 hours ago | parent | next [-]

It's far more active than redox and it's actually running on real consumer devices. There are more than a hundred monthly active committers on the repo you were looking at, and that's not the only repo fuchsia has. Calling it dead or prone to dying is simply not based on any objective reality.

afavour 6 hours ago | parent | prev [-]

Right now it’s looking like 6-7 commits per hour… it’s not nothing

NewJazz 9 hours ago | parent | prev [-]

Aww fudge. We kooked.

happymellon 3 hours ago | parent | prev [-]

Fuchsia is literally a Google project to avoid using Linux.

Look at their other "Open Source" projects like Android to understand why they would want to ensure they would avoid GPL code. It's all about control, and appearances of OS through gaslighting by source available.

dardeaup 10 hours ago | parent | prev [-]

Interesting! Can you elaborate?

cyboru 11 hours ago | parent | prev | next [-]

> Redox had a read-only ZFS driver but it was abandoned because of the monolithic nature of ZFS that created problems with the Redox microkernel design.

Curious about the details behind those compatibility problems.

arghwhat 11 hours ago | parent | next [-]

If it relied on OpenZFS, then I wouldn't be too surprised.

The whole ARC thing for example, sidestepping the general block cache, feels like a major hack resulting from how it was brutally extracted from Solaris at the time...

The way zfs just doesn't "fit" was why I had hope for btrfs... ZFS is still great for a file server, but wouldn't use it on a general purpose machine.

drewg123 9 hours ago | parent | next [-]

Solaris had a unified page cache, and ARC existed separately, along side of it there as well.

One huge problem with ZFS is that there is no zero copy due to the ARC wart. Eg, if you're doing sendfile() from a ZFS filesystem, every byte you send is copied into a network buffer. But if you're doing sendfile from a UFS filesystem, the pages are just loaned to the network.

This means that on the Netflix Open Connect CDN, where we serve close to the hardware limits of the system, we simply cannot use ZFS for video data due to ZFS basically doubling the memory bandwidth requirements. Switching from UFS to ZFS would essentially cut the maximum performance of our servers in half.

johannes1234321 9 hours ago | parent | prev | next [-]

Even on Solaris the ARC existed. ZFS replaces a lot of systems traditionally not directly related to a Filesystem implementation.

For instance using the `zfs` tool one wouldn't only configure file system properties, but also control NFS exports, which traditionally was done using /etc/exports.

p_l 3 hours ago | parent [-]

This was done as part of major UI/UX reshaping in Solaris 10 to make sysadmin lives easier, what it ultimately does is... Edits exports file..

ZFS and ZPOOL tools provide accesses to multiple different subsystems in ways that make more sense to end user, a lot like LVM and LUKS do on top of device mapper these days

goku12 9 hours ago | parent | prev | next [-]

Can you elaborate the last paragraph? In what way doesn't zfs fit? (I couldn't make it out from the first two paragraphs.) Where did btrfs fall short of your expectations? Why would you avoid zfs on general purpose machines if you deem it good enough for file servers?

goku12 2 hours ago | parent [-]

@arghwhat: To clarify, this isn't a rhetorical question. I'm interested in your technical insight on the subject - especially the comparisons.

pmarreck 8 hours ago | parent | prev [-]

I've been booting off ZFS-on-root for years.

jdjrbrjrbrh 9 hours ago | parent | prev [-]

Zfs relies on Solaris (Unix) kernel primitives IIRC ... I remember hearing that to get zfs to work with an is you basically have to implement a good portion of the Solaris kernel interface as shims

loeg 6 hours ago | parent | prev | next [-]

Does anyone have more context? This appears to be a very short, high-level blurb.

adastra22 11 hours ago | parent | prev | next [-]

How is redoxos on actual hardware? Are there laptops with good support?

kimixa 10 hours ago | parent [-]

It doesn't currently have any GPU support (for example) - even for a pretty simple desktop CPU rendering is rather incompatible with battery life or performance in a laptop form factor.

adastra22 10 hours ago | parent [-]

Not even Intel integrated GPU? Ugh.

hsbauauvhabzb 10 hours ago | parent [-]

The project does state it’s not ready to be used in any factor (server, desktop, etc).

adastra22 9 hours ago | parent [-]

Well, I’d be willing to develop and contribute to it, but I have absolutely no interest whatsoever in just running in virtualization.

kimixa 5 hours ago | parent | next [-]

It still supports display out through UEFI framebuffer, so it's technically usable, just you likely wouldn't want it as a daily driver.

hsbauauvhabzb 5 hours ago | parent | prev [-]

That’s your call to make.

n3storm 4 hours ago | parent | prev | next [-]

I hope someone can bring this issue to redox-os project about its package management command "pkgar". Reading it aloud in spanish sounds as "pa cagar". "pa" is a very common contramption of "para" so we have "para cagar" which translated back is "to shit".

Sorry for commenting this here, Redox is using a private gitlab instance I have no access to.

dralley 11 hours ago | parent | prev | next [-]

It would be more interesting to see bcachefs picked up there

koverstreet 11 hours ago | parent [-]

If someone's interested in working on a port, that'd be an interesting conversation.

Gabrys1 4 hours ago | parent | prev | next [-]

`fusermount3 ./redox-img`

at the end of the page should read

`fusermount3 -u ./redox-img`

Y_Y 40 minutes ago | parent [-]

For that matter, the "./file" pattern is only required to disambiguate executables in the local directory so it doesn't try to look them up in the PATH. For arguments like here it's redundant.

Modified3019 12 hours ago | parent | prev | next [-]

According to this https://www.redox-os.org/faq/ Looks like snapshots are planned.

fn-mote 8 hours ago | parent | prev | next [-]

Innovation is wonderful, but it’s hard to believe this has enough users to flush out the challenging bugs. Maybe if it had some kind of correctness proof, but it just seems like there are way too many subtle bugs in file systems in general for me to try a new FS.

rorychatt 7 hours ago | parent [-]

Building out test infrastructure for correctness to support the project sounds like a fantastic idea.

That said, while it's compatible with Linux via fuse, unless you're helping to build RedoxOS, I don't think there's any real expectation that you would try it.

snvzz 11 hours ago | parent | prev | next [-]

>File/directory quantity limit up to 4 billion per 193TiB (2^32 - 1 = 4294967295)

32bit inodes? why?

Other systems had to go through pains to migrate to 64bit. Why not skip that?

adgjlsfhk1 11 hours ago | parent [-]

Ext4 and NTFS both have a 2^32-1 limit on number of files as well. Realistically, you never actually want to make tons of files, so I have a pretty hard time seeing this being an issue in practice.

hexo 10 hours ago | parent [-]

Why not?

Dylan16807 6 hours ago | parent | next [-]

Piles of small files are unpleasant to deal with. Going over millions of files even without touching the contents gets annoying. Trying to back up or move big directories gets worse. If you have a hard drive involved it really gets bad, it can probably seek 10 million times in an entire day.

adgjlsfhk1 9 hours ago | parent | prev [-]

Files in nested folders are primarily an abstraction for humans. They are a maximally flexible and customizable system. This has substantial costs (especially in environments with parallel work). As such, no one really has millions of pieces of fully separate, unstructured, hierarchical data. Once you have that much data, there is almost always additional structure that would be better represented in something like a database where you can actually express the invariants that you have.

hexo 4 minutes ago | parent | next [-]

Filesystem is essentially a "simple" database. If it is not performing, then it is not a good db. It shouldn't really matter how many files you have if metadata, and indexing of that metadata is done properly (i.e. like in good db). It also has additional benefits to DB that usually do not even exist there as they aren't practical at all (like random access).

pitched 9 hours ago | parent | prev [-]

Aren’t block sizes (and minimum file size) normally around 4kB? So a max number of 1-byte files would take up around 16 TB, without adding any overhead. Those drives are available these days

adgjlsfhk1 8 hours ago | parent | next [-]

Many file systems support sub-block allocation

mastax 8 hours ago | parent | prev [-]

Nobody wants to store 2^32 1 Byte files and if you do you can make your own file system, frankly.

jhack 12 hours ago | parent | prev | next [-]

No transparent compression?

seanw444 12 hours ago | parent [-]

According to the bottom of their landing page [1], it's on the roadmap.

[1] https://www.redox-os.org/

zxspectrum1982 12 hours ago | parent | prev [-]

Why? Why not simply adopt btrfs?

johncolanduoni 12 hours ago | parent | next [-]

Well they’d have to write their own driver anyway for one. If they were going to take an existing design and write a new driver, ZFS would be the better choice by far. Much longer and broader operational history and much better documentation.

MadnessASAP 11 hours ago | parent [-]

And you might not get sued by Oracle! RedoxOS seems to use the MIT license while OpenZFS is under the CDDL. Given Oracles litigious nature they'd have to make sure none of their code looked like OpenZFS code, even better make sure any of the developers had ever even looked at the ZFS code.

Its much better to hope that OpenZFS decides to create a RedoxOS implementation themselves then to try and make a clean room ZFS implementation.

johncolanduoni 9 hours ago | parent [-]

Fair enough, though you can’t really understand how BTRFS works without reading the GPLed Linux source while ZFS has some separate disk format documentation. Don’t know that anyone would sue you though.

MadnessASAP 4 hours ago | parent [-]

Its not unreasonable to look at the source code to understand the disk format to then create an independent driver. So long as you are not directly copying code (or in this case, paraphrasing C to Rust.)

More importantly though, Linux or the Linux Foundation are unlikely to file a lawsuit without clear evidence of infringement, whereas Oracle by their nature will have filed lawsuits and a dozen motions if they catch even a whiff of possible infringement.

I wouldn't touch Oracle IP with a 50' fibreglass pole while wearing rubber boots.

craftkiller 11 hours ago | parent | prev | next [-]

License is the obvious blocker, aside from all the technical issues[0]. Btrfs is GPL, RedoxOS is MIT, ZFS is CDDL. You can integrate CDDL into an MIT project without problems[1], but due to the viral nature of the GPL, integrating btrfs would have impacts on the rest of the project.

What I'm wondering is what about HAMMER2? It's under a copyfree license and it is developed for a microkernel operating system (DragonflyBSD). Seems like a natural fit.

[0] btrfs holds the distinction of being the only filesystem that has lost all of my data, and it managed to do it twice! Corrupt my drive once, shame on you. Corrupt my drive twice, can't corrupt my drive again.

[1] further explanation: The CDDL is basically "the GPL but it only applies to the files under the CDDL, rather than the whole project". So the code for ZFS would remain under the CDDL and it would have all the restrictions that come with that, but the rest of the code base can remain under MIT. This is why FreeBSD can have ZFS fully integrated whereas on Linux ZFS is an out-of-tree module.

phire 10 hours ago | parent | next [-]

> Corrupt my drive twice, can't corrupt my drive again.

Exact same drive? You might want to check that drive isn't silently corrupting data.

I still blame btrfs, something very similar happened to me.

I had a WD Green drive with a known flaw were it would just silently zero data on writes in some random situations. EXT4 worked fine on this drives for years (the filesystem was fine, my files had random zeroed sections). But btrfs just couldn't handle this situation and immediately got itself into an unrecoverable state, scrub and fsck just couldn't fix the issue.

In one way, I was better off. At least I now knew that drive had been silently corrupting data for years. But it destroyed my confidence in btrfs forever. Btrfs didn't actually lose any additional data for me, it was in RAID and the data was all still there, so it should have been able to recover itself.

But it simply couldn't. I had to manually use a hex editor to piece a few files back together (and restore many others from backup).

Even worse, when I talked to people on the #btrfs IRC channel, not only was nobody was surprised the btrfs had borked itself due to bad hardware, but everyone recommend that a btrfs filesystem that had been borked could never be trusted. Instead, the only way to get a trustworthy, clean, and canonical btrfs filesystem was to delete it and start from scratch (this time without the stupid faulty drive)

Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state.

bigstrat2003 4 hours ago | parent | next [-]

I too have had data loss from BTRFS. Had a RAID-1 array where one of the drives started flaking out, sometimes it would disappear when rebooting the system. Unfortunately, before I could replace the drive, one time when booting my array had been corrupted and it was unrecoverable (or at least it was unrecoverable with my skill level). This wasn't a long time ago either, this was within the last 2-3 years. When I got the new drive and rebuilt the array, I used ZFS and it has been rock solid.

pmarreck 6 hours ago | parent | prev | next [-]

I wrote a tool to try to attack this specific problem (subtle, random drive corruption) in the general sense https://github.com/pmarreck/bitrot_guard but it requires re-running it for any modified files, which makes it mainly only suitable for long-term archival purposes. I'm not sure why one of these filesystems doesn't just invisibly include/update some par2 or other parity data so you at least get some unexpected corruption protection/insurance (plus notification when things are starting to go awry)

cmurf 3 hours ago | parent | prev [-]

> Basically, btrfs appears to be not fit for purpose. The entire point of such a filesystem is that it should be able to run in adverse environments (like faulty hardware) and be tolerant to errors. It should always be possible to repair such a filesystem back to a canonical state.

Pretty sure all file systems and their developers are unsurprised by file system corruption occurring on bad hardware.

There are also drives that report successful flush and fua, but the expected (meta)data is not yet on stable media. That results in out of order writes. There's no consequence unless there's a badly timed crash or power failure. In that case there's out of order writes and possibly dropped writes (what was left in the write cache).

File system developers have told me that their designs do not account for drives miscommunicating flush/fua succeeding when it hasn't. This is like operating under nobarrier some of the time.

Overwriting file systems' metadata have fixed locations, therefore quite a lot of assumptions can be made during repair about what should be there, inferring it from metadata in other locations.

Btrfs has no fixed locations for metadata. This leads to unique flexibility, and repair difficulty. Flexible: Being able to convert between different block group profiles (single, dup, and all the raids), and run on unequal sized drives, and conversion from any file system anybody wants to write the code for - because only the per device super blocks have fixed locations. Everything else can be written anywhere else. But the repair utility can't make many assumptions. And if the story told by the metadata that is present, isn't consistent, the repair necessarily must fail.

With Btrfs the first step is read-only rescue mount, which uses backup roots to find a valid root tree, and also the ability to ignore damaged trees. This read-only mount is often enough to extract important data that hasn't been (recently) backed up.

Since moving to Btrfs by default in Fedora almost 10 releases ago, we haven't seen more file system problems. One problem we do see more often is evidence of memory bitflips. This makes some sense because the file system metadata isn't nearly as big a target as data. And since both metadata and data are checksummed, Btrfs is more likely to detect such issues.

phire 2 hours ago | parent [-]

To be clear, I'm not expecting btrfs (or any filesystem) to avoid corrupt itself on unreliable hardware. I'm not expecting it to magically avoid unavoidable data loss.

All I want is an fsck that I can trust.

I love that btrfs will actually alert me to bad hardware. But then I expect to be able to replace the hardware and run fsck (or scrub, or whatever) and get back to the best-case healthy state with minimal fuss. And by "healthy" I don't mean ready for me to extract data from, I mean ready for me to mount and continue using.

In my case, I had zero corrupted metadata, and a second copy of all data. fsck/scrub should have been able to fix everything with zero interaction.

If files/metadata are corrupted, fsck/scrub should provide tooling for how to deal with them. Delete them? Restore them anyway? Manual intervention? IMO, failure is not a valid option.

boricj 4 hours ago | parent | prev | next [-]

> License is the obvious blocker, aside from all the technical issues. Btrfs is GPL

WinBtrfs [1], a reimplementation of btrfs from scratch for Windows systems, is licensed under the LGPL v3. Just because the reference implementation uses one license doesn't mean that others must use it too.

[1] https://github.com/maharmstone/btrfs

koverstreet 9 hours ago | parent | prev | next [-]

License isn't a blocker for a microkernel, with the filesystem being a completely separate service.

aidenn0 10 hours ago | parent | prev | next [-]

Last time I looked at DragonflyBSD, it was kind of an intermediate between a traditional kernel and a microkernel. There certainly was a lot more in the kernel as compared to systems built on e.g. L4.

There certainly is a continuum. I've always wanted to build a microkernel-ish system on top of Linux that only has userspace options for block devices, file systems and tcp/ip. It would be dog-slow but theoretically work.

stavros 10 hours ago | parent | prev [-]

You mean because the CDDL files would have to be licensed under GPL, and that's not compatible with the CDDL? I assume MIT-licensed files can be relicenssd as GPL, that's why that mix is fine?

craftkiller 9 hours ago | parent [-]

Yes, if ZFS (CDDL) was integrated into Linux (GPL) then the GPL would need to apply to the CDDL files, which causes a conflict because the CDDL is not compatible with the GPL.

This isn't a problem integrating MIT code into a GPL project, because MIT's requirements are a subset of the GPL's requirements so the combined project being under the GPL is no problem. (Going the other way by integrating GPL code into an MIT project is technically also possible, but it would covert that project to a GPL project so most MIT projects would be resistant to this.)

This isn't a problem combining MIT and CDDL because both lack the GPL's virality. They can happily coexist in the same project, leaving each other alone.

(obligatory: I am not a lawyer)

Brian_K_White 5 hours ago | parent [-]

And that's why zfs inches along with a fraction of the progress it could have had for decades.

This lack of required reciprocity and virtuous sounding "leave each other alone" is no virtue at all. It doesn't harm anyone else at least, which is great, but it's also shooting itself in the foot and a waste.

hsbauauvhabzb 12 hours ago | parent | prev [-]

Why not use ext2 or fat16?