| ▲ | waerhert 10 hours ago | ||||||||||||||||
Thanks for the feedback! Rainbow tables probably won't work here since the epoch (rounded to 5 minutes) salts everything, so precomputed tables would expire constantly. You would use use BSSIDs + SSID in real implementations. The MinHash part creates 128 hash values from the entire observed set, not a subset of 3, then LSH divides these into bands where similar sets collide probabilistically based on Jaccard similarity. I've tested with two phones where one saw 3-4 networks and the other saw ~10. They still found each other, and you can try it in the interactive demo yourself. You're right it's not entirely privacy preserving, really depends on your threat model as I discuss in the security section. Your idea about combining area hints with time-based salts is interesting though, feels like it could bridge geospatial indexing with this approach! The whole geospatial indexing thing was something I became aware of late in the project and didn't go into further. It's probably a much simpler approach if geolocation is important. In the current approach, geolocation doesn't matter at all, only 'the environment', whatever that may be. | |||||||||||||||||
| ▲ | vessenes 9 hours ago | parent [-] | ||||||||||||||||
Thanks for the detailed response! I completely missed the band division / similarity plan - I’ll read more thoroughly next time. I thought about geo largely because it radically changes the order of magnitude of work necessary; it lets you segment ‘possible’ subsets of APs down to sets of say 100, not millions, and changes the combinatorics. A side effect is knowing a rough spatial location. Off the top of my head, I don’t think that epochs alone make a big difference. If I want to see if you’ve been somewhere, or tell you I’m somewhere, why not take the 3-4 networks you mentioned, and forward hash them for the next million epochs? Or, more ambitiously, why not take 3-4 networks each from the geo indexed clusters available at https://wigle.net/ and do the forward and backward epochs, letting me track where you’ve been and pretend to be near you any time in the future? Wigle reports 1.7bn networks; a rough look at a suburban street near me shows most places have 10 in a reasonable range boundary; so call it 200mm “locations” with 128 segmented hashes, 250 billion hashes per epoch — I think we’re in the “seconds per epoch” range for a reasonable compute heavy server to cover the entire space. Upshot - I think the salting needs to be something local / not predictable or stored remotely. Hopefully these comments hit you right - I like the idea a lot - and I don’t fully understand the system - but as I understand it, the system does not offer privacy — I could replay any phone’s hashes against a system that cost a few dollars to reconstruct your location and time, if my understanding is correct. | |||||||||||||||||
| |||||||||||||||||