Remix.run Logo
avinassh 2 days ago

> The WAL was corrupted, the actual data is lost. There's no parity. You're suggesting that sqlite should somehow recreate the data from nothing.

Not all frames in the WAL are important. Sure, recovery may be impossible in some cases, but not all checksum failures are impossible to recover from.

daneel_w 2 days ago | parent | next [-]

My feeling is that any portion that no longer checksums correctly is practically forfeited, which makes any form of recovery an arbitrary result - a half-baked transaction. I don't see how that's a more desirable outcome than rolling back to the last known-good commit, which is never arbitrary.

avinassh 2 days ago | parent [-]

I posted in another comment: There is a smoll demo in the post. In it, I corrupt an old frame that is not needed by the database at all. Now, one approach would be to continue the recovery and then present both states: one where the WAL is dropped, and another showing whatever we have recovered. If I had such an option, I would almost always pick the latter.

nemothekid 2 days ago | parent | next [-]

>one where the WAL is dropped, and another showing whatever we have recovered. If I had such an option, I would almost always pick the latter

I can't imagine picking the latter unless you were treating sqlite like a filesystem of completely unrelated blobs.

If I run three transactions where:

1. John gives $100 to Sue.

2. Sue gives $100 to Mark.

3. Mark $100 money to Paul.

If sqlite, just erases transaction (2), then Mark materializes $100 from nowhere. The rest of your database is potentially completely corrupted. At that point your database is no longer consistent - I can't see how you would "almost always" prefer this.

If (2) is corrupt, then the restore stops at (1), and you are guaranteed consistency.

avinassh 2 days ago | parent [-]

Yes, this is a good example of showing an we cannot partly apply the WAL always. Again, let me repeat it, we cannot partly apply the WAL all the time expect it to work but there are some valid cases where we can do that to recover. Your example is not the one.

2 days ago | parent | prev [-]
[deleted]
pests 2 days ago | parent | prev | next [-]

> but not all checksum failures are impossible to recover from

Which failures are possible to recover from?

jandrewrogers 2 days ago | parent | next [-]

There are two examples I know of that require no additional data:

First, force a re-read of the corrupted page from disk. A significant fraction of data corruption occurs while it is being moved between storage and memory due to weak error detection in that part of the system. A clean read the second time would indicate this is what happened.

Second, do a brute-force search for single or double bit flips. This involves systematically flipping every bit in the corrupted page, recomputing the checksum, and seeing if corruption is detected.

daneel_w 2 days ago | parent [-]

> A significant fraction of data corruption occurs while it is being moved between storage and memory

Surely you mean on the memory bus specifically? SATA and PCIe both have some error correction methods for securing transfers between storage and host controller. I'm not sure about old parallel ATA. While I understand it can happen under conditions similar to non-ECC RAM being corrupted, I don't think I've ever heard or read about a case where a storage device randomly returned erroneous data, short of a legitimate hardware error.

jandrewrogers 2 days ago | parent [-]

The error correction in PCIe, SATA, etc is too weak to be reliable for modern high-performance storage. This is a known deficiency and why re-reading a corrupted page sometimes fixes things. PCIe v6 is introducing a much stronger error detection scheme to address this, which will mostly leave bit-rot on storage media as the major vector.

The bare minimum you want these days is a 64-bit CRC. A strong 128-bit hash would be ideal. Even if you just apply these at the I/O boundaries then you'll catch most corruption. The places it can realistically occur are shrinking but most software makes minimal effort to detect this corruption even though it is a fairly well-bounded problem.

daneel_w 2 days ago | parent [-]

Thanks for the tech details.

dec0dedab0de 2 days ago | parent | prev [-]

what if the corruption only affected the stored checksum, but not the data itself?

kstrauser 2 days ago | parent | next [-]

And you would detect this how?

2 days ago | parent [-]
[deleted]
daneel_w 2 days ago | parent | prev [-]

...then the checksum can't clear the associated operation. Same result.

2 days ago | parent | prev [-]
[deleted]