Remix.run Logo
mrkeen 3 hours ago

One of the perks of being distributed, I guess.

The kind of failure that a system can tolerate with strict fsync but can't tolerate with lazy fsync (i.e. the software 'confirms' a write to its caller but then crashes) is probably not the kind of failure you'd expect to encounter on a majority of your nodes all at the same time.

johncolanduoni 2 hours ago | parent [-]

It is if they’re in the same physical datacenter. Usually the way this is done is to wait for at least M replicas to fsync, but only require the data to be in memory for the rest. It smooths out the tail latencies, which are quite high for SSDs.

senderista an hour ago | parent [-]

You can push the safety envelope a bit further and wait for your data to only be in memory in N separate fault domains. Yes, your favorite ultra-reliable cloud service may be doing this.