| ▲ | janalsncm 16 hours ago | |
I remember when the first article was posted. Their method requires two parallel corpuses e.g. people who write on LinkedIn (under their real name) and Reddit. Also, people who post under their real name are likely to write with their real voice: > Any deanonymization setup with ground truth introduces distributional biases. In our cross-platform datasets, the pro-files are likely easier to deanonymize than an average profile: the very fact that ground truth exists implies that the user may not have cared about anonymity in the first place. Similarly, two split-profiles of a single user are inherently alike, whereas two pseudonymous accounts of the same person (e.g., an official and a pseudonymous alt account) might expose more heterogeneous micro-data. | ||