Remix.run Logo
lll-o-lll 3 hours ago

> And it is! The record is either in local storage or in central storage.

But it isn’t! Because there are many hardware failure modes that mean that you aren’t getting your log back.

For the same reason that you need acks=all in Kafka for zero data loss, or synchronous_commit = remote_flush in PostgreSQL, you need to commit your audit log to more than the local disk!

otterley 2 hours ago | parent [-]

If your hardware and software can’t guarantee that writes are committed when they say they are, all bets are off. I am assuming a scenario in which your hardware and/or cloud provider doesn’t lie to you.

In the world you describe, you don’t have any durability when the network is impaired. As a purchaser I would not accept such an outcome.

lll-o-lll an hour ago | parent [-]

It’s about avoiding single points of failure.

> In the world you describe, you don’t have any durability when the network is impaired.

Yes, the real world. If you want durability, a single physical machine is never enough.

This is standard distributed computing, and we’ve had all (most) of the literature and understanding of this since the 70’s. It’s complicated, and painful to get right, which is why people normally default to a DB (or cloud managed service).

The reason this matters for this logging scenario is that I normally don’t care if I lose a bit of logging in a catastrophic failure case. It’s not ideal, but I’m trading RPO for performance. However, when regs say “thou shalt not lose thy data”, I move the other way. Which is why the streams are separate. It does impose an architectural design constraint because audit can’t be treated as a subset of logs.