Remix.run Logo
klodolph 3 days ago

:-/ it’s a statistical guarantee in the first place. A successful commit in a durable storage engine just needs to achieve some finite level of durability, like “10^-7 probability of loss per year”. The durability is a property of the whole system, and it is possible to achieve durability without fsync, you just may have a hard time explaining what the durability is, how you calculated it, and what the evidence or justifications are for the numbers you give.

Even if you just look at hardware failure rates, you get unrecoverable I/O errors (data corruption) at about one in 10^15 bits, disk failures at a rate of about 1% per year, etc. People usually like to have better guarantees than those numbers give you with just a plain fsync anyway; so you are probably forced to do an analysis of the whole system if you want to provide good durability guarantees and be able to explain where the guarantees come from.

asdfasgasdgasdg 3 days ago | parent | next [-]

10^-7 (loss/record) * 10^8 (record/year) yields 10 data losses per year. If you're even a medium sized business you need a much better than 10^-7 probability of losses.

Dylan16807 3 days ago | parent | next [-]

That's only true if your typical loss event loses one record. If you have a one in a million chance of an array failure taking out 10% of your production database, and otherwise have zero possibility of data loss, you also get 10^-7 losses per record.

And I wouldn't assume they meant that number to be per record in the first place.

asdfasgasdgasdg 3 days ago | parent [-]

I don't think anyone in history has ever achieved a true 10^-7 annual probability of any data loss incident. So they must have been making some kind of per record or per operation claim.

klodolph 3 days ago | parent [-]

I like to think that the true AFR for data is bounded by something like 10^-3, because maybe that’s close to the rate at which civilizations collapse. You have to use a kind of subtle definition to support 10^-7 or 10^-9 or 10^-11. Or maybe instead of “subtle definition”, you can call it a “whimsical, imaginary definition”. Depends on how cynical you are.

The way I would go is by saying that you multiply the number of objects by AFR, and that’s close to the actual losses on most years. You can then exclude WW3 and the late holocene extinction event from your consideration. Or simple bankruptcy, for that matter. If your employer is gone, you don’t care about its data any more.

klodolph 3 days ago | parent | prev [-]

The half-remembered storage system I pulled those numbers from had records ~100G in size, so a 10^-7 loss is 1 loss event per year, per exabyte of data. A loss event is just “at least one bit in the record cannot be read within a certain deadline”.

Durability is a knob. If you have enough data, or turn the knob too far in the direction of durability, you will simply bankrupt yourself or maybe drown your service in latency. It makes sense that you would have storage services that provide different levels of durability.

jakewins 3 days ago | parent | prev [-]

I used to say this as well but like.. industry has, for a long time now equated “durable” with “stored on disk”. Any DBA will assume that’s what it means, and use that fact when they work out the replication they need either in clustering or in raid.

If you’re building a data storage system and are using the term “durable” to mean “it’s in RAM on three virtual machines”, for example, I don’t think it’s unfair to say that you are lying to your customers, because you are intentionally misusing a well-established term.

zbentley 3 days ago | parent | next [-]

I forget the product, but more than a decade ago I remember someone broke out their durability into a table with columns for all the settings their data store offered between “ram on one node” and “fsync confirmed on a quorum of nodes’ disks” and rows for example failure cases ranging from “unexpected reboot of one machine” to “catastrophic loss of quorum-1 machines”. Cells were data loss risks from “prevented” to “possible” to “likely”.

That was very helpful when choosing durability levels.

klodolph 3 days ago | parent | prev [-]

I don’t have any respect for the viewpoint that “durable” is equatable with “stored on disk”, and I don’t want to spend time accommodating that viewpoint. It is just an oversimplification in a very bad way.

AFRs and discussions about different failure scenarios are the bare minimum. The bare minimum for scenarios is disk loss, total machine loss, and data center loss. This is just my take on things. I don’t care if something is on disk or not. I do care what happens when a sector on disk goes bad, when a faulty power supply destroys all the disks in a machine, or when a data center floods.

That forces you to think about things like whether you want to turn on synchronous replication.

jakewins 2 days ago | parent [-]

The point of “durable” implying stored to durable media is precisely that it allows the operator of the system to make that kind of calculation. They know the disks they picked and the replication chosen, and as long as the database calls fsync, their calculations will work.

My beef is with database systems that use the argument you made further up thread to skip fsync to juice their performance numbers. Data is not “durable” if turning off the machines storing it means it’s lost, that’s a category difference, not a pure probability difference as you are claiming.

It is of course totally fine to not store data to durable media and say the risk of devops doing a coordinated reboot is as low as the risk of raid disk data loss, but then don’t use the word “durable”.

klodolph 7 hours ago | parent [-]

That definition of durable doesn’t seem useful to me, sorry. I want the failure rates and scenarios.