| Backup and recovery is a process with a non-zero failure rate. The more you test it, the lower the rate, but there is always a failure mode. With these systems, the runtime guarantees of data integrity are very high and the failure rate is very low. And best of all, failure is constantly happening as a normal activity in the system. So once you have data integrity guarantees that are better in you runtime system than your backup process, why backup? There are still reasons, but they become more specific to the data being stored and less important as a general datastore feature. |
| |
| ▲ | overfeed 6 days ago | parent [-] | | ...and the "Disaster" in "Disaster recovery" may have been localized and extensive (fire, flooding, major earthquake, brownouts due to a faulty transformer, building collapse, a solvent tanker driving through the wall into the server room, a massive sinkhole, etc) | | |
| ▲ | shermantanktop 6 days ago | parent [-] | | Yes, the dreaded fiber vs. backhoe. But if your distributed file system is geographically redundant, you're not exposed to that, at least from an integrity POV. It sucks that 1/3 or 1/5 or whatever of your serving fleet just disappeared, but backup won't help with that. | | |
| ▲ | overfeed 5 days ago | parent [-] | | > But if your distributed file system is geographically redundant Redundancy and backups are not the same thing! There's some overlap, but treating them as interchangeable will occasionally result in terrible outcomes, like when a config change that results in all 5/5 datacenters fragmenting and failing to create a quorum, then finding out your services have circular dependencies when you are trying to bootstrap foundational services. Local backups would solve this, each DC would load last known good config, but rebuilding consensus necessary for redundancy requires coordination from now-unreachable hosts. |
|
|
|