| |
| ▲ | nonrandomstring a day ago | parent | next [-] | | > blame here are solely the ones employing these fingerprinting techniques, Sure. And it's a tragedy. But when you look at the bot situation and
the sheer magnitude of resource abuse out there, you have to see it
from the other side. FWIW the conversation mentioned above, we acknowledged that and moved
on to talk about behavioural fingerprinting and why it makes sense
not to focus on the browser/agent alone but what gets done with it. | | |
| ▲ | NavinF a day ago | parent [-] | | Last time I saw someone complaining about scrapers, they were talking about 100gib/month. That's 300kbps. Less than $1/month in IP transit and ~$0 in compute. Personally I've never noticed bots show up on a resource graph. As long as you don't block them, they won't bother using more than a few IPs and they'll backoff when they're throttled | | |
| ▲ | marcus0x62 a day ago | parent | next [-] | | For some sites, things are a lot worse. See, for example, Jonathan Corbet's report[0]. 0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664 | |
| ▲ | lmz a day ago | parent | prev | next [-] | | How can you say it's $0 in compute without knowing if the data returned required any computation? | |
| ▲ | nonrandomstring 17 hours ago | parent | prev [-] | | Didn't rachelbytheebay post recently that her blog was being swamped?
I've heard that from a few self-hosting bloggers now. And Wikipedia
has recently said more than half of traffic is noe bots. ARe you
claiming this isn't a real problem? |
|
| |
| ▲ | fc417fc802 a day ago | parent | prev [-] | | > The companies to blame here are solely the ones employing these fingerprinting techniques, Let's not go blaming vulnerabilities on those exploiting them. Exploitation is also bad but being exploitable is a problem in and of itself. | | |
| ▲ | shiomiru 15 hours ago | parent | next [-] | | > Let's not go blaming vulnerabilities on those exploiting
them. Exploitation is also bad but being exploitable is a problem in and
of itself. There's "vulnerabilities" and there's "inherent properties of a complex
protocol that is used to transfer data securely". One of the latter is
that metadata may differ from client to client for various reasons,
inside the bounds accepted in the standard. If you discriminate based
on such metadata, you have effectively invented a new proprietary
protocol that certain existing browsers just so happen to implement. It's like the UA string, but instead of just copying a single HTTP
header, new browsers now have to reverse engineer the network stack of
existing ones to get an identical user experience. | | |
| ▲ | fc417fc802 15 hours ago | parent [-] | | I get that. I don't condone the behavior of those doing the fingerprinting. But what I'm saying is that the fact that it is possible to fingerprint should in pretty much all cases be viewed as a sort of vulnerability. It isn't necessarily a critical vulnerability. But it is a problem on some level nonetheless. To the extent possible you should not be leaking information that you did not intend to share. A protocol that can be fingerprinted is similar to a water pipe with a pinhole leak. It still works, it isn't (necessarily) catastrophic, but it definitely would be better if it wasn't leaking. |
| |
| ▲ | Jubijub 3 hours ago | parent | prev [-] | | I’m sorry but you comment shows you never had to fight this problem a scale. The challenge is not small time crawlers, the challenge is blocking large / dedicated actors. The problem is simple : if there is more than X volume of traffic per <aggregation criteria >, block it.
Problem : most aggregation criteria are trivially spoofable, or very cheap to change :
- IP : with IPv6 this is not an issue to rotate your IP often
- UA : changing this is scraping 101
- SSL fingerprint : easy to use the same as everyone
- IP stack fingerprint : also easy to use a common one
- request / session tokens : it’s cheap to create a new session
You can force login, but then you have a spam account creation challenge, with the same issues as above, and depending on your infra this can become heavy Add to this that the minute you use a signal for detection, you “burn” it as adversaries will avoid using it, and you lose measurement thus the ability to know if you are fixing the problem at all. I worked on this kind of problem for a FAANG service, whoever claims it’s easy clearly never had to deal with motivated adversaries |
|
|