Remix.run Logo
vessenes 9 hours ago

Thanks for the detailed response! I completely missed the band division / similarity plan - I’ll read more thoroughly next time.

I thought about geo largely because it radically changes the order of magnitude of work necessary; it lets you segment ‘possible’ subsets of APs down to sets of say 100, not millions, and changes the combinatorics. A side effect is knowing a rough spatial location.

Off the top of my head, I don’t think that epochs alone make a big difference. If I want to see if you’ve been somewhere, or tell you I’m somewhere, why not take the 3-4 networks you mentioned, and forward hash them for the next million epochs?

Or, more ambitiously, why not take 3-4 networks each from the geo indexed clusters available at https://wigle.net/ and do the forward and backward epochs, letting me track where you’ve been and pretend to be near you any time in the future?

Wigle reports 1.7bn networks; a rough look at a suburban street near me shows most places have 10 in a reasonable range boundary; so call it 200mm “locations” with 128 segmented hashes, 250 billion hashes per epoch — I think we’re in the “seconds per epoch” range for a reasonable compute heavy server to cover the entire space.

Upshot - I think the salting needs to be something local / not predictable or stored remotely.

Hopefully these comments hit you right - I like the idea a lot - and I don’t fully understand the system - but as I understand it, the system does not offer privacy — I could replay any phone’s hashes against a system that cost a few dollars to reconstruct your location and time, if my understanding is correct.

waerhert 9 hours ago | parent [-]

I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks. I was under the assumption services like Wigle only provided the reverse lookup (wifi -> coords). Indeed an attacker could potentially reverse the LSH tags if it hashed the wifi environment within very small geofences. It's bit of a needle & haystack problem but not an impossible one with enough resources. I wouldn't say it's a perfect system and I don't mind it falling apart under scrutiny, I just found it an interesting idea so I really appreciate you thinking along here.

Edit: Maybe some preshared group hash (kinda beats the point), or combining multiple modalities (eg bluetooth, shared interests) or some kind of proof of work token could help mitigate some of these issues. I guess anything to reduce the time to attack helps in this case? Or anything that really pins down environment + time, like what smath described in his comment. In essence, the core idea of minhash + lsh works and it doesn't limit you to just wifi networks. The key is being able to grab a fingerprint that is unique enough and different enough each epoch. Wifi networks are just easy enough to grab vs something more low level like an APs beacon timing interval jitter or something.

kevin_nisbet 8 hours ago | parent [-]

> I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks

I think this is the issue, is these datasets are out there and at least big tech companies have them since they're used to assist with GPS. I was about to post the same thing as above but saw vessenes beat me to it.

Without thinking about it too hard, the two directions I see are either making observations of the environment in real-time that is only relevant at that time (IE sniffing actual wireless frames, even if they're encrypted and making observations on them, however, most devices won't let you go into promiscuous mode and do this) or encrypting the messages in flight so only participants can decrypt them (IE a model like the signal protocol with E2E message encryption).

Anyways, this is a cool approach, but that risk occurred to me as well about the ability to just brute force the entire dataset to decode every location.