Remix.run Logo
MaxGabriel 5 hours ago

This might stem from the domain I work in (banking), but I have the opposite take. Soft delete pros to me:

* It's obvious from the schema: If there's a `deleted_at` column, I know how to query the table correctly (vs thinking rows aren't DELETEd, or knowing where to look in another table)

* One way to do things: Analytics queries, admin pages, it all can look at the same set of data, vs having separate handling for historical data.

* DELETEs are likely fairly rare by volume for many use cases

* I haven't found soft-deleted rows to be a big performance issue. Intuitively this should be true, since queries should be O log(N)

* Undoing is really easy, because all the relationships stay in place, vs data already being moved elsewhere (In practice, I haven't found much need for this kind of undo).

In most cases, I've really enjoyed going even further and making rows fully immutable, using a new row to handle updates. This makes it really easy to reference historical data.

If I was doing the logging approach described in the article, I'd use database triggers that keep a copy of every INSERT/UPDATE/DELETEd row in a duplicate table. This way it all stays in the same database—easy to query and replicate elsewhere.

gleenn 3 hours ago | parent | next [-]

If you're implementing immutable DB semantics maybe you should consider Datomic or alternatives because then you get that for free, for everything, and you also get time travel which is an amazing feature on top. It lets you be able to see the full, coherent state of the DB at any moment!

nine_k 5 hours ago | parent | prev | next [-]

> DELETEs are likely fairly rare by volume for many use cases

All your other points make sense, given this assumption.

I've seen tables where 50%-70% were soft-deleted, and it did affect the performance noticeably.

> Undoing is really easy

Depends on whether undoing even happens, and whether the act of deletion and undeletion require audit records anyway.

In short, there are cases when soft-deletion works well, and is a good approach. In other cases it does not, and is not. Analysis is needed before adopting it.

tharkun__ 4 hours ago | parent | next [-]

Agreed. And if deletes are soft, you likely really just wanted a complete audit history of all updates too (at least that's for the cases I've been part of). And then performance _definitely_ would suffer if you don't have a separate audit/archive table for all of those.

pixl97 3 hours ago | parent [-]

I mean, yes, growth forever doesn't tend to work.

I've seen a number of apps that require audit histories work on a basis where they are archived at a particular time, and that's when the deletes occurred and indexes fully rebuilt. This is typically scheduled during the least busy time of the year as it's rather IO intensive.

da_chicken 3 hours ago | parent | prev [-]

> I've seen tables where 50%-70% were soft-deleted, and it did affect the performance noticeably.

At that point you should probably investigate partitioning or data warehousing.

eddd-ddde 4 hours ago | parent | prev | next [-]

I never got to test this, but I always wanted to explore in postgres using table partitions to store soft deleted items in a different drive as a kind of archived storage.

I'm pretty sure it is possible, and it might even yield some performance improvements.

That way you wouldn't have to worry about deleted items impacting performance too much.

gleenn 3 hours ago | parent [-]

It's definitely an interesting approach but the problem is now you have to change all your queries and undeleting get more complicated. There are strong trade-offs with almost all the approaches I've heard of.

snuxoll 3 hours ago | parent [-]

With partitioning? No you don't. It gets a bit messy if you also want to partition a table by other values (like tenant id or something), since then you probably need to get into using table inheritance instead of the easier declarative partitioning - but either technique just gives you a single effective table to query.

edmundsauto an hour ago | parent [-]

Pg moves the data between positions on update?

bandrami 18 minutes ago | parent [-]

If you are updating the parent table and the partition key is correctly defined, then an update that puts a row in a different partition is translated into a delete on the original child table and an insert on the new child table, since v11 IIRC. But this can lead to some weird results if you're using multiple inheritance so, well, don't.

ozim an hour ago | parent | prev | next [-]

DELETEs are likely fairly rare by volume for many use cases

I think one of our problems is getting users to delete stuff they don’t need anymore.

rawgabbit 3 hours ago | parent | prev [-]

I have worked with databases my entire career. I hate triggers with a passion. The issue is no one “owns” or has the authority to keep triggers clean. Eventually triggers become a dumping ground for all sorts of nasty slow code.

I usually tell people to stop treating databases like firebase and wax on/wax off records and fields willy nilly. You need to treat the database as the store of your business process. And your business processes demand retention of all requests. You need to keep the request to soft delete a record. You need to keep a request to undelete a record.

Too much crap in the database, you need to create a field saying this record will be archived off by this date. On that date, you move that record off into another table or file that is only accessible to admins. And yes, you need to keep a record of that archival as well. Too much gunk in your request logs? Well then you need to create an archive process for that as well.

These principles are nothing new. They are in line with “Generally Accepted Record Keeping Principles” which are US oriented. Other countries have similar standards.