| ▲ | vessenes 9 hours ago | |||||||
Thanks for the detailed response! I completely missed the band division / similarity plan - I’ll read more thoroughly next time. I thought about geo largely because it radically changes the order of magnitude of work necessary; it lets you segment ‘possible’ subsets of APs down to sets of say 100, not millions, and changes the combinatorics. A side effect is knowing a rough spatial location. Off the top of my head, I don’t think that epochs alone make a big difference. If I want to see if you’ve been somewhere, or tell you I’m somewhere, why not take the 3-4 networks you mentioned, and forward hash them for the next million epochs? Or, more ambitiously, why not take 3-4 networks each from the geo indexed clusters available at https://wigle.net/ and do the forward and backward epochs, letting me track where you’ve been and pretend to be near you any time in the future? Wigle reports 1.7bn networks; a rough look at a suburban street near me shows most places have 10 in a reasonable range boundary; so call it 200mm “locations” with 128 segmented hashes, 250 billion hashes per epoch — I think we’re in the “seconds per epoch” range for a reasonable compute heavy server to cover the entire space. Upshot - I think the salting needs to be something local / not predictable or stored remotely. Hopefully these comments hit you right - I like the idea a lot - and I don’t fully understand the system - but as I understand it, the system does not offer privacy — I could replay any phone’s hashes against a system that cost a few dollars to reconstruct your location and time, if my understanding is correct. | ||||||||
| ▲ | waerhert 9 hours ago | parent [-] | |||||||
I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks. I was under the assumption services like Wigle only provided the reverse lookup (wifi -> coords). Indeed an attacker could potentially reverse the LSH tags if it hashed the wifi environment within very small geofences. It's bit of a needle & haystack problem but not an impossible one with enough resources. I wouldn't say it's a perfect system and I don't mind it falling apart under scrutiny, I just found it an interesting idea so I really appreciate you thinking along here. Edit: Maybe some preshared group hash (kinda beats the point), or combining multiple modalities (eg bluetooth, shared interests) or some kind of proof of work token could help mitigate some of these issues. I guess anything to reduce the time to attack helps in this case? Or anything that really pins down environment + time, like what smath described in his comment. In essence, the core idea of minhash + lsh works and it doesn't limit you to just wifi networks. The key is being able to grab a fingerprint that is unique enough and different enough each epoch. Wifi networks are just easy enough to grab vs something more low level like an APs beacon timing interval jitter or something. | ||||||||
| ||||||||