▲ | saithir 12 hours ago | |
Because unlike the authors of this set - who went and stripped the posts out of usernames and permalinks to anonymize it - that set you mention just grabbed data out of the API as-is (at least based on its huggingface description that's left over). That's the difference. | ||
▲ | spiffytech 10 hours ago | parent [-] | |
Just a reminder that anonymization is much harder than merely removing metadata: Every time I hear "anonymous data", I think of that time AOL published anonymized search logs (for academic research). The anonymization was negligent, and an NYT reporter de-anonymized and tracked down one of the users with the local & personal info present in the search queries. https://en.wikipedia.org/wiki/AOL_search_log_release https://web.archive.org/web/20130404175032/http://www.nytime... |