There are programs with which you can add any desired amount of redundancy to your backup archives, so that they would survive corruption that does not affect a greater amount of data than the added redundancy.

For instance, on Linux there is par2cmdline. For all my backups, I create pax archives, which are then compressed, then encrypted, then expanded with par2create, then aggregated again in a single pax file (the legacy tar file formats are not good for faithfully storing all metadata of modern file systems and each kind of tar program may have proprietary non-portable extensions to handle this, therefore I use only the pax file format).

Besides that, important data should be replicated and stored on 2 or even 3 SSDs/HDDs/tapes, which should preferably be stored themselves in different locations.

▲

antonkochubey 15 hours ago | parent | next [-]

Unfortunately some SSD controllers plainly refuse to read data they consider corrupted, even if you have extra parity that could potentially restore corrupted data, your entire drive might refuse to read.

▲

lazide 14 hours ago | parent [-]

Huh?

The issue being discussed is random blocks, yes?

If your entire drive is bricked, that is an entirely different issue.

▲

jeremyvisser 14 hours ago | parent [-]

Here’s the thing. That SSD controller is the interface between you and those blocks.

If it decides, by some arbitrary measurement, as defined by some logic within its black box firmware, that it should stop returning all blocks, then it will do so, and you have almost no recourse.

This is a very common failure mode of SSDs. As a consequence of some failed blocks (likely exceeding a number of failed blocks, or perhaps the controller’s own storage failed), drives will commonly brick themselves.

Perhaps you haven’t seen it happen, or your SSD doesn’t do this, or perhaps certain models or firmwares don’t, but some certainly do, both from my own experience, and countless accounts I’ve read elsewhere, so this is more common than you might realise.

▲

cogman10 5 hours ago | parent | next [-]

I really wish this responsibility was something hoisted up into the FS and not a responsibility of the drive itself.

It's ridiculous (IMO) that SSD firmware is doing so much transparent work just to keep the illusion that the drive is actually spinning metal with similar sector write performance.

	▲	immibis an hour ago \| parent [-]
		Linux supports raw flash, called an MTD device (memory technology device). It's often used in embedded systems. And it has MTD-native filesystems such as ubifs. But it's only really used in embedded systems because... PC SSDs don't expose that kind of interface. (Nor would you necessarily want them to. A faulty driver would quietly brick your hardware in a matter of minutes to hours)

▲

londons_explore 8 hours ago | parent | prev | next [-]

The mechanism is usually that the SSD controller requires that some work be done before your read - for example rewriting some access tables to record 'hot' data.

That work can't be done because there is no free blocks. However, no space can be freed up because every spare writable block is bad or is in some other unusable state.

The drive is therefore dead - it will enumerate, but neither read nor write anything.

▲

reactordev 13 hours ago | parent | prev | next [-]

This is correct, you still have to go through firmware to gain access to the block/page on “disk” and if the firmware decides the block is invalid than it fails.

You can sidestep this by bypassing the controller on a test bench though. Pinning wires to the chips. At that point it’s no longer an SSD.

▲

lazide 10 hours ago | parent | prev [-]

Yes, and? HDD controllers dying and head crashes are a thing too.

At least in the ‘bricked’ case it’s a trivial RMA - corrupt blocks tend to be a harder fight. And since ‘bricked’ is such a trivial RMA, manufacturers have more of an incentive to fix it or go broke, or avoid it in the first place.

This is why backups are important now; and always have been.

	▲	mort96 8 hours ago \| parent [-]
		We're not talking about the SSD controller dying. The SSD controller in the hypothetical situation that's being described is working as intended.

▲

mywittyname 5 hours ago | parent | prev | next [-]

This is fine, but I'd prefer an option to transparently add parity bits to the drive, even if it means losing access to capacity.

Personally, I keep backups of critical data on a platter disk NAS, so I'm not concerned about losing critical data off of an SSD. However, I did recently have to reinstall Windows on a computer because of a randomly corrupted system file. Which is something this feature would have prevented.

▲

casenmgreen 4 hours ago | parent | prev [-]

Thank you for this.

I had no knowledge of pax, or that par was an open standard, and I care about what they help with. Going to switch over to using both in my backups.