| ▲ | dygd 19 hours ago | |
> Each SDK might be tattling on you, but unless you give them a key to match you across apps, each signal from each app is unique You'd be surprised what can be done when data from different source is fused together. Large-Scale Online Deanonymization with LLMs: https://news.ycombinator.com/item?id=47139716 Robust De-anonymization of Large Sparse Datasets: https://www.cs.cornell.edu/~shmat/shmat_oak08netflix.pdf | ||
| ▲ | sroussey 18 hours ago | parent | next [-] | |
There are whole companies that de-anon ad data as a service. Which gives the lots of data brokers the ability to not do the last mile and feel good about themselves. It’s a joke. | ||
| ▲ | janalsncm 16 hours ago | parent | prev [-] | |
I remember when the first article was posted. Their method requires two parallel corpuses e.g. people who write on LinkedIn (under their real name) and Reddit. Also, people who post under their real name are likely to write with their real voice: > Any deanonymization setup with ground truth introduces distributional biases. In our cross-platform datasets, the pro-files are likely easier to deanonymize than an average profile: the very fact that ground truth exists implies that the user may not have cared about anonymity in the first place. Similarly, two split-profiles of a single user are inherently alike, whereas two pseudonymous accounts of the same person (e.g., an official and a pseudonymous alt account) might expose more heterogeneous micro-data. | ||