Remix.run Logo
gmassman 8 hours ago

This is a very handy postgres extension! We've been using it at my job for a couple years now to generate test datasets for developers. We have a weekly job that restores a prod backup to a temporary DB, installs the `anon` extension, and runs pg_dump with the masking rules. Overall we've been very happy with this workflow since it gives us a very good idea of how new features will work with our production data. The masking rules do need maintenance as our DB schema changes, but that's par for the course with these kinds of dev tools.

All that said, I wouldn't rely on this extension as a way to deliver anonymized data to downstream consumers outside of our software team. As others have pointed out, this is really more of a pseudonymization technique. It's great for removing phone numbers, emails, etc. from your data set, but it's not going to eradicate PII. Pretty much all anonymized records can be traced back to their source data through PKs or FKs.

daamien 4 hours ago | parent [-]

Pseudonymizing functions are just one way to mask the data.

There are many other masking functions that will actually anonymize the data.

And the extension does not force you to respect the foreign keys.

It's really up to you to decide how you want to implement your masking policy