| ▲ | pavel_lishin 21 hours ago | |||||||
Anonymizing data is incredibly difficult to do: https://www.theguardian.com/technology/2014/jun/27/new-york-... > New York City has released data of 173m individual taxi trips – but inadvertently made it "trivial" to find the personally identifiable information of every driver in the dataset. | ||||||||
| ▲ | afarah1 20 hours ago | parent | next [-] | |||||||
Interesting read, thanks. The related article shows that even more robust anonymization techniques may still be insufficient (in the case of the taxi rides, spatial-temporal analysis could still lead to de-anonymization). More reason to reduce data collection. Unfortunately the trend is the opposite for governments all around the world. | ||||||||
| ▲ | wtallis 20 hours ago | parent | prev | next [-] | |||||||
That example only demonstrates leaked information of the drivers, not the passengers/customers. And the "anonymized" driver and license data wouldn't need to be released in any form at all to produce a dataset useful for public transportation planning purposes: approximate time of day and approximate location are sufficient to estimate demand, and there's no need to keep track of who is making which trips. | ||||||||
| ||||||||
| ▲ | the_sleaze_ 19 hours ago | parent | prev [-] | |||||||
It's really not unless of course you are dis-incentivized to provide anonymous data. The ground is thick with prior art and existing solutions. https://www.hhs.gov/hipaa/for-professionals/special-topics/d... | ||||||||
| ||||||||