▲ | gkbrk 18 hours ago | |||||||
Clickhouse has something similar called clickhouse-obfuscator [1]. It even works offline with data dumps so you can quickly prepare and send somewhat realistic example data to others. According to its --help output, it is designed to retain the following properties of data: - cardinalities of values (number of distinct values) for every column and for every tuple of columns; - conditional cardinalities: number of distinct values of one column under condition on value of another column; - probability distributions of absolute value of integers; sign of signed integers; exponent and sign for floats; - probability distributions of length of strings; - probability of zero values of numbers; empty strings and arrays, NULLs; - data compression ratio when compressed with LZ77 and entropy family of codecs; - continuity (magnitude of difference) of time values across table; continuity of floating point values. - date component of DateTime values; - UTF-8 validity of string values; - string values continue to look somewhat natural [1]: https://clickhouse.com/docs/en/operations/utilities/clickhou... | ||||||||
▲ | bux93 16 hours ago | parent | next [-] | |||||||
The Dutch national office of statistics has tools intended to de-identify 'microdata' such that k-anonimity[1] is achieved called mu-argus[2] and tau-argus. [1] A release of data is said to have the k-anonymity property if the information for each person contained in the release cannot be distinguished from at least k-1 individuals whose information also appear in the release. https://en.wikipedia.org/wiki/K-anonymity [2] https://research.cbs.nl/casc/mu.htm | ||||||||
| ||||||||
▲ | JosephRedfern 12 hours ago | parent | prev [-] | |||||||
There's a write up from Alexey of different approaches considered for clickhouse-obfuscator here: https://clickhouse.com/blog/five-methods-of-database-obfusca.... The summary is pretty funny: > "After trying four methods, I got so tired of this problem that it was time just to choose something, make it into a usable tool, and announce the solution" |