Remix.run Logo
everdrive 3 days ago

I'm deeply confused by a lot of the privacy discourse here. There seems to be opposing goals between preventing the fingerprinting mechanisms and just preventing uniqueness. Under the "preventing uniqueness" model, my Linux computer with custom Firefox and no fonts, and no js, etc. is the "most fingerprint-able" because it's the most unique. Whereas grandma on Windows and Chrome is "less unique," and therefore in some sense less fingerprint-able.

I think there are a few potential problems with this view that I never see discussed:

- Firefox sends some dummy data when making use of privacy.resistFingerprinting, and so you should get a unique fingerprint _every time_ you visit a site, so the fact alone that you're unique might potentially not matter if you're _differently_ unique every time you visit the site. Is there a flaw in this line of thinking?

- My understanding is that the primary utility of browser fingerprinting is for advertising / tracking. In other words, the bulk of the population an advertiser would actually care about would be the huge middle of the bell curve on Chrome using Windows, not the privacy nuts on Linux with a custom browser config. In other words, if "blending in with the crowd" really worked I would think that tracking companies would fail against the most important and largest part of the user pool. If anything, it's more important to target grandma as she will actually click on ads and buy stuff online compulsively.

Can anyone speak to these points? I often feel like the pro-privacy people are just crawling in the dark and not really aware of that real-world tracking is actually occurring vs. what might be possible in a research paper. Maybe I'm just the one that's confused?

rsync 3 days ago | parent | next [-]

"... so the fact alone that you're unique might potentially not matter if you're _differently_ unique every time you visit the site. Is there a flaw in this line of thinking?"

No, you're thinking correctly and the odd discourse that you (and I) see is based on two implicit assumptions:

1) Your threat model is a global observer that notices - and tracks and exploits - your supposed perfect per-request uniqueness.

2) Our browsers do not give us fine grained control over every observable value so if only one variable is randomized per request, that can be discarded and you are still identifiable by (insert collection of resolution and fan speed or mouse jiggle or whatever).

Item (1) I don't care about. I'd prefer per-hit uniqueness to what I have now.

Item (2) is a valid concern and speaks to the blunt and user-hostile tools available to us (browsers, that is) which barely rise to the level of any definition of "user agent" we might imagine.

I repeat: I would much prefer fully randomized per-request variables and I don't care how unique they are relative to other traffic. I care about how unique they are relative to my other requests. Unfortunately, I am wary of browser plug-ins and have no good way to build a trust model with the 12 different plug-ins this behavior would require. This is the fault of firefox and the bad decisions they continue to make.

franga2000 3 days ago | parent [-]

> Unfortunately, I am wary of browser plug-ins and have no good way to build a trust model with the 12 different plug-ins this behavior would require. This is the fault of firefox and the bad decisions they continue to make.

I see so many people paranoid about browser extensions and I really don't see the point. It's like any other software. If you trust the author, install it. If you don't trust the author, check the source code, install it (ideally from source), disable automatic updates and subscribe to the changelog. Is this any different from any other thing you install on your device?

gruez 3 days ago | parent | prev | next [-]

>- Firefox sends some dummy data when making use of privacy.resistFingerprinting, and so you should get a unique fingerprint _every time_ you visit a site, so the fact alone that you're unique might potentially not matter if you're _differently_ unique every time you visit the site. Is there a flaw in this line of thinking?

Yes, because those randomized results can be detected, and that can be incorporated into your fingerprint. Think of a site that asks you about your birthday. If you put in obviously false answers like "February 31, 1901", a smart implementation could just round those answers off to "lies about birthday" rather than taking them at face value.

>- My understanding is that the primary utility of browser fingerprinting is for advertising / tracking. In other words, the bulk of the population an advertiser would actually care about would be the huge middle of the bell curve on Chrome using Windows, not the privacy nuts on Linux with a custom browser config. In other words, if "blending in with the crowd" really worked I would think that tracking companies would fail against the most important and largest part of the user pool. If anything, it's more important to target grandma as she will actually click on ads and buy stuff online compulsively.

The problem is all this fingerprinting/profiling machinery ends up building a profile on privacy conscious people, even if they're impossible to sell to. That can later be exploited if the data gets leaked, or the government demands it. "I'm not a normie so nobody would want to show ads to me" doesn't address this.

throwawayqqq11 3 days ago | parent [-]

Advertisers try to reidentify and match you against their database, the less information you give them and the more randomized it is, the less certain they can be, its you again.

If i use my locked down firefox with a VPN where potentially a hand full other brills like me come out on the other end, i am not concerned about them building a profile of me.

gruez 3 days ago | parent [-]

>Advertisers try to reidentify and match you against their database, the less information you give them and the more randomized it is, the less certain they can be, its you again.

This assumes the randomization is done properly, otherwise it just turns into a signal of "installs privacy extensions", which can still be used for targeting, as a sibling commenter has mentioned.

bigbadfeline 2 days ago | parent [-]

> otherwise it just turns into a signal of "installs privacy extensions"

The more such signals, the merrier, the parent of your comment addressed that ("other brills"). Instead of going in circles, it would be better to encourage people to evaluate and use such extensions. Telling them that they don't work is a self-defeating half-truth.

aerostable_slug 3 days ago | parent | prev | next [-]

I used to work in adtech a long while back. We found that our system could effectively target people who tried not to be targeted. By that I mean we realized a better ROI that without said targeting and click-throughs & conversions were happening for our customers at a nice rate.

At the end of the day the object of the exercise is generally less about building a perfect profile of a person and a lot more about getting said person to buy something. We found our system worked very well at figuring out what ads worked on privacy-conscious people and our customers saw a nice ROI from it.

In fact, it turns out pro-privacy technically skilled people cluster nicely and it's entirely possible to sell them stuff, and their attempts to be less 'profileable' than normal actually helped our mission (which was advertising, an endeavor that in my experience doesn't GAF about violating a given person's privacy in the way the pro-privacy crowd often thinks it does).

Take from that what you will.

everdrive 3 days ago | parent | next [-]

Can you describe in more detail what sort of techniques were used to target and track people? What sort of privacy mitigations were feckless?

aerostable_slug 3 days ago | parent [-]

It wasn't so much that privacy mitigations were feckless, it was the fact that people who did things like falsify their User-Agent strings tended to cluster into distinct groups very nicely, and hence it was easy for the targeting algorithms to feed them effective ads, landing pages, etc.

The targeting system went "oh goody, privacy geeks" and was able to very effectively do its job. This is because ad tech systems care less about you as everdrive the named individual with privacy interests and other human aspects, and more about you as some potential consumer of goods.

While it's possible to use the systems to profile people in the sense that a stalker might, that's not really the intent (in the way people like to think of it). I (in the past tense, I don't do adtech anymore) honestly don't care about you, I just want you to buy shit from the people who pay me to sell you their particular flavor of shit. If you hiding your exact name or browser details or whatever makes that more likely (it turns out it did), then hooray! There's no conflict there, where to some there would be (because their assumptions about motive are all wrong).

In terms of what techniques, we found machine learning (stats) way back then did a pretty good job of clustering people based on things browsers return (monitor resolution, OS, etc.) coupled with time of day, search terms, and other things you can't really suppress. A completely contrived example might be pushing expensive pediatric electrolytes to someone with a large-screened Mac looking up baby flu symptoms at 2 am. The "system" did a far better job of real time targeting with this stuff than any human could, and the things it would cluster on were often rather unintuitive.

everdrive 3 days ago | parent [-]

Thanks, this is a really useful reply. With regard to identifying privacy-focused users in your system, I'm sort of imagining the following scenario:

- user has a bunch of privacy mitigations and tweaks in their browser

- user logs into your commerce site, searches for stuff to buy

- commerce site knows who the user is since they're logged in, and per your comment can infer whether or not they're wealthy, have kids, etc. based on the user activity and whether or not the user is likely have an expensive screen & GPU, etc.

Does that sound right? That's really interesting, and something I'm embarrassed to say I hadn't really thought of. In other words, I've spent a lot of time worrying about cross-site tracking, and advertising domains, etc. However, if I'm purchasing from Amazon they know it's me since I'm just shipping to my own house. Even in a scenario where my browser is magically un-fingerprint-able, it's obviously me since I'm using my account and shipping to my house.

In other words, I may potentially have prevented a bunch of cross-site tracking and fingerprinting. Perhaps when I go to washingtonpost.com they don't know I'm the same person that Amazon knows about. (that might be a best case scenario) However, by virtue of the fact that my privacy config is operating all the time, Amazon has also learned something about me I didn't necessarily need to tell them -- ie, that I'm privacy-focused.

Do you think that's a fair assessment, or am I missing the point?

aerostable_slug 2 days ago | parent [-]

> Amazon has also learned something about me I didn't necessarily need to tell them -- ie, that I'm privacy-focused.

Exactly, and it turns out privacy-focused people tend to be relatively self-similar in many ways and, at least back in the day, were easier to advertise to.

Now, Amazon (or whomever) still doesn't know your name, but they are still targeting you and, what I found super interesting, is the fact that it was often more effective targeting than if the person was just part of the bulk of the population. It's kinda like if the system observed all the nonconformist teens like to wear Doc Marten's.

runlaszlorun 3 days ago | parent | prev [-]

Informative, thanks for sharing.

godelski 3 days ago | parent | prev | next [-]

  > Whereas grandma on Windows and Chrome is "less unique," and therefore in some sense less fingerprint-able.
I got highly unique on FF so I tested Safari on a M2 Air. Still says I'm highly unique. I'm on a university campus internet, there's thousands of people with that exact same setup. I don't think I've ever seen a finger printing site that doesn't say I'm very unique.

I think the problem I have with these types of sites is that they do not really offer advice on how to become less unique and how to protect one's self. It's probably pretty easy to identify machines through things like canvas fingerprinting or through all the other things that the browser actually exposes. Many privacy browsers like Tor or Mullvad will just send no data to those. That makes them "unique" because there's not many people using browsers that do that but it's unique in a way that makes you fungible. There's unique as in "uncommon" but also unique as "differentiable." I can't understand how these sites never make that distinction.

socalgal2 3 days ago | parent | prev | next [-]

You are correct, the discussion is often unthoughtful and spun.

> the bulk of the population an advertiser would actually care about would be the huge middle of the bell curve on Chrome using Windows

The middle of the bell curve in the USA would be an iPhone and there is very little you can customize. So many people have the same model with the same settings that trying to track by fingerprinting is effectively useless.

Yes, PC/Linux users have more to track. They are the minority though. I'm not saying therefore ignore this issue. But grandma is using her phone. Not a PC.

> Firefox sends some dummy data when making use of privacy.resistFingerprinting, and so you should get a unique fingerprint _every time_ you visit a site

This assumes the fingerprinter can't filter out that random data, and that the feature is actually useful. Some of things it does sound like sites might fail or cause problems. Setting timezone to something else seems like I'm going to make a reservation for 7pm only to find out it was 7pm in another timezone. other things it doesn't might not be good for grandma. CSS will report preferred reduced motion as False. CSS will report preferred contrast as No Preference.

everdrive 3 days ago | parent [-]

I definitely agree with you point, but I think that's what I'm wondering about? Can it actually be filtered out, and are tracking companies actually do this in practice, or is it like when someone says they bridge an airgap by making two computer's RF spectrum do funny things? Possible in a lab, yes. Something most people need to worry about? No.

I'm not saying this _isn't_ the case for tracking -- I just don't have much of a way to know what techniques are actually being employed in real life.

joahnn_s 3 days ago | parent | prev | next [-]

This is the paradox: Imagine walking dressed in red in the middle of a crowd dressed in black.

Being unique makes one easily identifiable and requires less effort to correlate one's past activity, while non-unique ones are full of noises and low confidence.

ranger_danger 3 days ago | parent | prev [-]

The comment by gruez is accurate IMO.

Creepjs actually tries to detect what your browser is lying about and takes that into consideration (or not) based on its heuristics.

I'm still not aware of any FOSS browser (with JS actually enabled and functioning) that can produce a random fingerprint ID on every refresh of the creepjs test site.

But please prove me wrong.

everdrive 3 days ago | parent [-]

I've never heard of creepjs, are there more resources about it and where it's used?