Remix.run Logo
dochtman a day ago

Part of the fingerprint is stuff like the ordering of extensions, which Chrome could easily do but AFAIK doesn’t.

(AIUI Google’s Play Store is one of the biggest TLS fingerprinting culprits.)

shiomiru a day ago | parent | next [-]

Chrome has randomized its ClientHello extension order for two years now.[0]

The companies to blame here are solely the ones employing these fingerprinting techniques, and those relying on services of these companies (which is a worryingly large chunk of the web). For example, after the Chrome change, Cloudflare just switched to a fingerprinter that doesn't check the order.[1]

[0]: https://chromestatus.com/feature/5124606246518784

[1]: https://blog.cloudflare.com/ja4-signals/

nonrandomstring a day ago | parent | next [-]

> blame here are solely the ones employing these fingerprinting techniques,

Sure. And it's a tragedy. But when you look at the bot situation and the sheer magnitude of resource abuse out there, you have to see it from the other side.

FWIW the conversation mentioned above, we acknowledged that and moved on to talk about behavioural fingerprinting and why it makes sense not to focus on the browser/agent alone but what gets done with it.

NavinF a day ago | parent [-]

Last time I saw someone complaining about scrapers, they were talking about 100gib/month. That's 300kbps. Less than $1/month in IP transit and ~$0 in compute. Personally I've never noticed bots show up on a resource graph. As long as you don't block them, they won't bother using more than a few IPs and they'll backoff when they're throttled

marcus0x62 a day ago | parent | next [-]

For some sites, things are a lot worse. See, for example, Jonathan Corbet's report[0].

0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664

lmz a day ago | parent | prev | next [-]

How can you say it's $0 in compute without knowing if the data returned required any computation?

nonrandomstring 17 hours ago | parent | prev [-]

Didn't rachelbytheebay post recently that her blog was being swamped? I've heard that from a few self-hosting bloggers now. And Wikipedia has recently said more than half of traffic is noe bots. ARe you claiming this isn't a real problem?

fc417fc802 a day ago | parent | prev [-]

> The companies to blame here are solely the ones employing these fingerprinting techniques,

Let's not go blaming vulnerabilities on those exploiting them. Exploitation is also bad but being exploitable is a problem in and of itself.

shiomiru 15 hours ago | parent | next [-]

> Let's not go blaming vulnerabilities on those exploiting them. Exploitation is also bad but being exploitable is a problem in and of itself.

There's "vulnerabilities" and there's "inherent properties of a complex protocol that is used to transfer data securely". One of the latter is that metadata may differ from client to client for various reasons, inside the bounds accepted in the standard. If you discriminate based on such metadata, you have effectively invented a new proprietary protocol that certain existing browsers just so happen to implement.

It's like the UA string, but instead of just copying a single HTTP header, new browsers now have to reverse engineer the network stack of existing ones to get an identical user experience.

fc417fc802 15 hours ago | parent [-]

I get that. I don't condone the behavior of those doing the fingerprinting. But what I'm saying is that the fact that it is possible to fingerprint should in pretty much all cases be viewed as a sort of vulnerability.

It isn't necessarily a critical vulnerability. But it is a problem on some level nonetheless. To the extent possible you should not be leaking information that you did not intend to share.

A protocol that can be fingerprinted is similar to a water pipe with a pinhole leak. It still works, it isn't (necessarily) catastrophic, but it definitely would be better if it wasn't leaking.

Jubijub 3 hours ago | parent | prev [-]

I’m sorry but you comment shows you never had to fight this problem a scale. The challenge is not small time crawlers, the challenge is blocking large / dedicated actors. The problem is simple : if there is more than X volume of traffic per <aggregation criteria >, block it. Problem : most aggregation criteria are trivially spoofable, or very cheap to change : - IP : with IPv6 this is not an issue to rotate your IP often - UA : changing this is scraping 101 - SSL fingerprint : easy to use the same as everyone - IP stack fingerprint : also easy to use a common one - request / session tokens : it’s cheap to create a new session You can force login, but then you have a spam account creation challenge, with the same issues as above, and depending on your infra this can become heavy

Add to this that the minute you use a signal for detection, you “burn” it as adversaries will avoid using it, and you lose measurement thus the ability to know if you are fixing the problem at all.

I worked on this kind of problem for a FAANG service, whoever claims it’s easy clearly never had to deal with motivated adversaries

gruez a day ago | parent | prev [-]

What's the advantage of randomizing the order, when all chrome users already have the same order? Practically speaking there's a bazillion ways to fingerprint Chrome besides TLS cipher ordering, that it's not worth adding random mitigations like this.