Remix.run Logo
nonrandomstring 3 months ago

> blame here are solely the ones employing these fingerprinting techniques,

Sure. And it's a tragedy. But when you look at the bot situation and the sheer magnitude of resource abuse out there, you have to see it from the other side.

FWIW the conversation mentioned above, we acknowledged that and moved on to talk about behavioural fingerprinting and why it makes sense not to focus on the browser/agent alone but what gets done with it.

NavinF 3 months ago | parent [-]

Last time I saw someone complaining about scrapers, they were talking about 100gib/month. That's 300kbps. Less than $1/month in IP transit and ~$0 in compute. Personally I've never noticed bots show up on a resource graph. As long as you don't block them, they won't bother using more than a few IPs and they'll backoff when they're throttled

marcusb 3 months ago | parent | next [-]

For some sites, things are a lot worse. See, for example, Jonathan Corbet's report[0].

0 - https://social.kernel.org/notice/AqJkUigsjad3gQc664

NavinF 3 months ago | parent [-]

He provides no info. req/s? 95%ile mbps? How does he know the requests come from an "AI-scraper" as opposed to a normal L7 DDoS? LWN is a pretty simple site, it should be easy to saturate 10G ports

nonrandomstring 3 months ago | parent | prev | next [-]

Didn't rachelbytheebay post recently that her blog was being swamped? I've heard that from a few self-hosting bloggers now. And Wikipedia has recently said more than half of traffic is noe bots. ARe you claiming this isn't a real problem?

NavinF 3 months ago | parent [-]

How exactly can a blog get swamped? It takes ~0 compute per request. Yes I'm claiming this is a fake problem

lmz 3 months ago | parent | prev [-]

How can you say it's $0 in compute without knowing if the data returned required any computation?

NavinF 3 months ago | parent [-]

Look at the sibling replies. All the kvetching comes from blogs and simple websites, not the ones that consume compute per request