Part of the fingerprint is stuff like the ordering of extensions, which Chrome could easily do but AFAIK doesn’t.

(AIUI Google’s Play Store is one of the biggest TLS fingerprinting culprits.)

shiomiru 3 months ago | parent | next [-]

Chrome has randomized its ClientHello extension order for two years now.[0]

The companies to blame here are solely the ones employing these fingerprinting techniques, and those relying on services of these companies (which is a worryingly large chunk of the web). For example, after the Chrome change, Cloudflare just switched to a fingerprinter that doesn't check the order.[1]

[0]: https://chromestatus.com/feature/5124606246518784

[1]: https://blog.cloudflare.com/ja4-signals/

▲

fc417fc802 3 months ago | parent | next [-]

> The companies to blame here are solely the ones employing these fingerprinting techniques,

Let's not go blaming vulnerabilities on those exploiting them. Exploitation is also bad but being exploitable is a problem in and of itself.

▲

shiomiru 3 months ago | parent | next [-]

> Let's not go blaming vulnerabilities on those exploiting them. Exploitation is also bad but being exploitable is a problem in and of itself.

There's "vulnerabilities" and there's "inherent properties of a complex protocol that is used to transfer data securely". One of the latter is that metadata may differ from client to client for various reasons, inside the bounds accepted in the standard. If you discriminate based on such metadata, you have effectively invented a new proprietary protocol that certain existing browsers just so happen to implement.

It's like the UA string, but instead of just copying a single HTTP header, new browsers now have to reverse engineer the network stack of existing ones to get an identical user experience.

	▲	fc417fc802 3 months ago \| parent [-]
		I get that. I don't condone the behavior of those doing the fingerprinting. But what I'm saying is that the fact that it is possible to fingerprint should in pretty much all cases be viewed as a sort of vulnerability. It isn't necessarily a critical vulnerability. But it is a problem on some level nonetheless. To the extent possible you should not be leaking information that you did not intend to share. A protocol that can be fingerprinted is similar to a water pipe with a pinhole leak. It still works, it isn't (necessarily) catastrophic, but it definitely would be better if it wasn't leaking.

▲

Jubijub 3 months ago | parent | prev [-]

I’m sorry but you comment shows you never had to fight this problem a scale. The challenge is not small time crawlers, the challenge is blocking large / dedicated actors. The problem is simple : if there is more than X volume of traffic per <aggregation criteria >, block it. Problem : most aggregation criteria are trivially spoofable, or very cheap to change : - IP : with IPv6 this is not an issue to rotate your IP often - UA : changing this is scraping 101 - SSL fingerprint : easy to use the same as everyone - IP stack fingerprint : also easy to use a common one - request / session tokens : it’s cheap to create a new session You can force login, but then you have a spam account creation challenge, with the same issues as above, and depending on your infra this can become heavy

Add to this that the minute you use a signal for detection, you “burn” it as adversaries will avoid using it, and you lose measurement thus the ability to know if you are fixing the problem at all.

I worked on this kind of problem for a FAANG service, whoever claims it’s easy clearly never had to deal with motivated adversaries

	▲	immibis 3 months ago \| parent [-]
		Should be easy enough to create a DroneBL for residential proxy services. Since you work on residential proxy detection at a FAANG service, why haven't you done it yet? If they're doing things the above-board way from their own ASN, block their ASN. If they're doing things the above-board way from third-party hosting providers, send abuse reports. Late last year there was a commotion because someone was sending single spoofed SSH SYN packets, from the addresses of Tor nodes, to organizations with extremely sensitive security policies. Many people with Tor nodes got threats of being banned from their hosting provider, over a single packet they didn't even send. They're definitely going to ban people who are doing actual DDoSes from their servers. DDoS is also a federal crime, so if you and they are in the USA, you might consider trying to get them put in prison.

▲

nonrandomstring 3 months ago | parent | prev [-]

> blame here are solely the ones employing these fingerprinting techniques,

Sure. And it's a tragedy. But when you look at the bot situation and the sheer magnitude of resource abuse out there, you have to see it from the other side.

FWIW the conversation mentioned above, we acknowledged that and moved on to talk about behavioural fingerprinting and why it makes sense not to focus on the browser/agent alone but what gets done with it.

▲

NavinF 3 months ago | parent [-]

Last time I saw someone complaining about scrapers, they were talking about 100gib/month. That's 300kbps. Less than $1/month in IP transit and ~$0 in compute. Personally I've never noticed bots show up on a resource graph. As long as you don't block them, they won't bother using more than a few IPs and they'll backoff when they're throttled

▲

marcusb 3 months ago | parent | next [-]

For some sites, things are a lot worse. See, for example, Jonathan Corbet's report[0].

0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664

	▲	NavinF 3 months ago \| parent [-]
		He provides no info. req/s? 95%ile mbps? How does he know the requests come from an "AI-scraper" as opposed to a normal L7 DDoS? LWN is a pretty simple site, it should be easy to saturate 10G ports

▲

nonrandomstring 3 months ago | parent | prev | next [-]

Didn't rachelbytheebay post recently that her blog was being swamped? I've heard that from a few self-hosting bloggers now. And Wikipedia has recently said more than half of traffic is noe bots. ARe you claiming this isn't a real problem?

	▲	NavinF 3 months ago \| parent [-]
		How exactly can a blog get swamped? It takes ~0 compute per request. Yes I'm claiming this is a fake problem

▲

lmz 3 months ago | parent | prev [-]

How can you say it's $0 in compute without knowing if the data returned required any computation?

	▲	NavinF 3 months ago \| parent [-]
		Look at the sibling replies. All the kvetching comes from blogs and simple websites, not the ones that consume compute per request

▲

gruez 3 months ago | parent | prev [-]

What's the advantage of randomizing the order, when all chrome users already have the same order? Practically speaking there's a bazillion ways to fingerprint Chrome besides TLS cipher ordering, that it's not worth adding random mitigations like this.