Remix.run Logo
imiric 5 days ago

The scenarios you mentioned are indeed nice use cases of ZFS, but other tools can do this too.

I can make snapshots and recover files with SnapRAID or Kopia. In the case of a laptop system drive failure, I have scripts to quickly setup a new system, and restore data from backups. Sure, the new system won't be a bit-for-bit replica of the old one, and I'll have to manually tinker to get everything back in order, but these scenarios are so uncommon that I'm fine with this taking a bit more time and effort. I'd rather have that over relying on a complex filesystem whose performance degrades over time, and is difficult to work with and understand.

You speak about ZFS as if it's a silver bullet, and everything else is inferior. The reality is that every technical decision has tradeoffs, and the right solution will depend on which tradeoffs make the most sense for any given situation.

kkfx 5 days ago | parent [-]

How often do you test your OS replication script? I used to do that too, and every time there was always something broken, outdated, or needing modification, often right when I desperately needed a restore because I was about to leave on a business trip and had a flight to catch with a broken laptop disk.

How much time do you spend setting up a desktop and maintaining it with mdraid+LUKS+LVM+your choice of filesystem, replacing a disk and doing the resilvering, or making backups with SnapRAID/Kopia etc? Again, I used to do that. I stopped after finding better solutions, also because I always had issues during restores, maybe small ones, but they were there, and when it's not a test but a real restore, the last thing you want is problems.

Have you actually tested your backup by doing a sudden, unplanned restore without thinking about it for three days before? Do you do it at least once a year to make sure everything works, or do you just hope that since computers rarely fail and restores take a long time, everything will work when you need it? When I did things like you and others I know who still do it, practically no one ever tested their restore, and the recovery script was always one distro major release behind. You had to modify it every few releases when doing a fresh install. In the meantime, it's "hope everything goes well or spend a whole day scrambling to fix things."

Maybe a student is okay with that risk and enjoys fixing things, but generally, it's definitely not best practice and that's why most are on someone else's computer, called the cloud, as protection from their IT choices...

imiric 5 days ago | parent [-]

> How often do you test your OS replication script?

Not often. It's mostly outdated, and I spend a lot of time bringing it up to date when I have to rely on it.

BUT I can easily understand what it does, and the tools it uses. In practice I use it rarely, so spending a few hours a year updating it is not a huge problem. I don't have the sense of urgency you describe, and when things do fail, it's an extraordinary event where everything else can wait for me to be productive again. I'm not running a critical business, these are my personal machines. Besides, I have plenty of spare machines I can use while one is out of service.

This is the tradeoff I have decided to make, which works for me. I'm sure that using ZFS and a reproducible system has its benefits, and I'm trying to adopt better practices at my own pace, but all of those have significant drawbacks as well.

> Have you actually tested your backup by doing a sudden, unplanned restore without thinking about it for three days before?

No, but again, I'm not running a critical business. Things can wait. I would argue that even in most corporate environments the obsession over HA comes at the expense of operational complexity, which has a greater negative impact than using boring tools and technology. Few companies need Kubernetes clusters and IaC tools, and even fewer people need ZFS and NixOS for personal use. It would be great if the benefits of these tools were accessible to more people with less drawbacks, but the technology is not there yet. You shouldn't gloss over these issues because they're not issues for you.

kkfx 5 days ago | parent [-]

Most companies have terrible infrastructure; they're hardly ever examples to follow. But they also have it because there's a certain widespread mentality among those who work there, which originates on the average student's desktop, where they play with Docker instead of understanding what they're using. This is the origin of many modern software problems: the lack of proper IT training in universities.

MIT came up with "The Missing Semester of Your CS Education" to compensate, but it's nothing compared to what's actually needed. It's assumed that students will figure it out on their own, but that almost never happens, at least not in recent decades. It's also assumed that it's something easy to do on your own, that it can be done quickly, which is certainly not the case and I don't think it ever has been. But the teacher who doesn't know is the first to have that bias.

The exceptional event, even if it doesn't require such a rapid response, still reveals a fundamental problem in your setup. So the question should be: why maintain this complex script when you can do less work with something else? NixOS and Guix are tough nuts to crack at first: NixOS because of its language and poor/outdated/not exactly well-done documentation; Guix because its development is centered away from the desktop and it lacks some elements common in modern distros, etc. But once you learn them, there's much less overhead to solve problems and keep everything updated, much less than maintaining custom scripts.