> If you want durability, a single physical machine is never enough.

It absolutely can be. Perhaps you are unfamiliar with modern cloud block storage, or RAID backed by NVRAM? Both have durability far above and beyond a single physical disk. On AWS, for example, ec2 Block Express offers 99.999% durability. Alternatively, you can, of course, build your own RAID 1 volumes atop ordinary gp3 volumes if you like to design for similar loss probabilities.

Again, auditors do not care -- a fact you admitted yourself! They care about whether you took reasonable steps to ensure correctness and availability when needed. That is all.

> when regs say “thou shalt not lose thy data”, I move the other way. Which is why the streams are separate. It does impose an architectural design constraint because audit can’t be treated as a subset of logs.

There's no conflict between treating audit logs as logs -- which they are -- with having separate delivery streams and treatment for different retention and durability policies. Regardless of how you manage them, it doesn't change their fundamental nature. Don't confuse the nature of logs with the level of durability you want to achieve with them. They're orthogonal matters.

▲

lll-o-lll 2 hours ago | parent [-]

> It absolutely can be. Perhaps you are unfamiliar with modern cloud block storage, or RAID backed by NVRAM? Both have durability far above and beyond a single physical disk. On AWS, for example, ec2 Block Express offers 99.999% durability. Alternatively, you can, of course, build your own RAID 1 volumes atop ordinary gp3 volumes if you like to design for similar loss probabilities.

Certainly you can solve for zero data loss (RPO=0) at the infrastructure level. It involves synchronously replicating that data to a separate physical location. If your threat model includes “fire in the dc”, reliable storage isn’t enough. To survive a site catastrophe with no data loss you must maintain a second, live copy (synchronous replication before ack) in another fault domain.

In practice, to my experience, this is done at the application level rather than trying to do so with infrastructure.

> There's no conflict between treating audit logs as logs -- which they are -- with having separate delivery streams and treatment for different retention and durability policies

It matters to me, because I don’t want to be dependent on a sync ack between two fault domains for 99.999% of my logs. I only care about this when the regulator says I must.

> Again, auditors do not care -- a fact you admitted yourself! They care about whether you took reasonable steps to ensure correctness and availability when needed. That is all.

I care about matching the solution to the regulation; which varies considerably by country and use-case. However there are multiple cases I have been involved with where the stipulation was “you must prove you cannot lose this data, even in the case of a site-wide catastrophe”. That’s what RPO zero means. It’s DR, i.e., after a disaster. For nearly everything 15 minutes is good, if not great. Not always.

	▲	otterley an hour ago \| parent [-]
		> It matters to me, because I don’t want to be dependent on a sync ack between two fault domains for 99.999% of my logs. I only care about this when the regulator says I must. If you want synchronous replication across fault domains for a specific subset of logs, that’s your choice. My point is that treating them this way doesn’t make them not logs. They’re still logs. I feel like we’re largely in violent agreement, other than whether you actually need to do this. I suspect you’re overengineering to meet an overly stringent interpretation of a requirement. Which regimes, specifically, dictated that you must have synchronous replication across fault domains, and for which set of data? As an attorney as well as a reliability engineer, I would love to see the details. As far as I know, no one - no one - has ever been held to account by a regulator for losing covered data due to a catastrophe outside their control, as long as they took reasonable measures to maintain compliance. RPO=0, in my experience, has never been a requirement with strict liability regardless of disaster scenario.