Remix.run Logo
instagib 5 hours ago

I have a script that creates a hash based on all files in a directory - photos 2004. Then save the hash separately to a text file.

I have 3 copies so I can check the archive version, active storage volume, and local version to see if any lost integrity in the transfer process.

I’m curious how it would compare against my old CDs and DVDs that were previous backups. My work does something similar for tape drive data.

pwg 3 hours ago | parent | next [-]

If you are willing to sacrifice some storage space on the disk, then dvdisaster (https://en.wikipedia.org/wiki/Dvdisaster) can add extra ECC data to the disk that will allow recovery even if some percentage of the disk errors out upon read later.

Granted, if one no longer has the mechanical drive, or if the disk errors out beyond the threshold where the extra ECC can correct the errors, the data's still lost. But it (dvdisaster) does provide some protection from the "bit-rot" case where the disk slowly degrades.

iamnothere 15 minutes ago | parent [-]

Par2 is also very good for resilient storage. It uses parity files that can reconstruct bitrotted files. https://en.wikipedia.org/wiki/Parchive

dehrmann 4 hours ago | parent | prev [-]

DVDs use Reed–Solomon coding, so they effectively store a hash and recovery data for you. When a sector is irrecoverable, reading that sector fails.

Gabrys1 4 hours ago | parent [-]

For this purpose, I think it would be nice to access the raw data, to see any errors that would be otherwise masked. As someone in the comments suggested, one might compare number of corrected errors in 1, 2, 5 years and compare to the number of redundant bits stored to estimate the expected longevity of the medium

dehrmann 3 hours ago | parent [-]

dvdisaster might already be able to do this analysis.