Remix.run Logo
quibono 5 days ago

> storing audit data in the same DB and it inevitably having functionality written against it, your audit data becoming a part of the business logic

What's the "proper" way to do this? Separate DB? Separate data store?

KronisLV 5 days ago | parent | next [-]

Typically you want your audit/log data to be immutable and kept in an append only data store.

Whether that's a typical relational DB or something more specialized (like a log shipping solution) that's up to you, but usually it would be separate from the main DB.

If you need some functionality that depends on events that have taken place, you probably want to store information about those events in the main data store (but only what's needed for that functionality, not a list of all mutations done to a table like audit data might include).

In general, it's nice to have such a clear boundary of where the business domain ends and where the aux. stuff to help you keep it running goes - your logs and audit data, analytics and metrics, tracing spans and so on.

Edit: as a critique of my own arguments here, I will admit that doing the above can introduce some complexity and that in simpler systems it might be overkill. But I've seen what happens when everything is just in one huge DB instance, where about 90% of the overall schema size is literally due to records in those audit tables and everyone is surprised why opening the "History" tab for a record takes a while (and anything else that references said history, e.g. visibility of additional records), and it's not great either.

zbentley 5 days ago | parent | prev | next [-]

I was also onboard with GP’s comment until I got to this part.

Audit data in the same DB is great, because it can be written transactionally for relatively cheap (multi table updates, triggers, actual transactions with multiple writes, etc).

After that, sure, ship it elsewhere and prune the audit tables if you like. But having the audit writes go directly to Kafka or whatnot is a pain because it requires your client logic to a) have a distributed publish-event transaction (which can work in this case more easily than distributed transactions in general with careful use of idempotency keys, read back, or transactional outboxes, but it’s complicated and requires everyone writing to auditable tables to play along), and b) reduces your reliability because now the audit store or its message queue needs to be online for every write as well as your database.

And there’s plenty of good reasons for business logic to use (only for reads) audit data. What else would business logic do if an audit table existed and there was a business need to e.g. show customers a change history for something? Build another redundant audit system instead?

tremon 5 days ago | parent | prev [-]

Separate schema, no read permissions for the application identity is sufficient. It's not like "separate db" makes it magically unqueryable.