| ▲ | lm411 3 hours ago | |||||||
AI companies and notably AI scrapers are a cancer that is destroying what's left of the WWW. I was hit with a pretty substantial botnet "distributed scraping" attack yesterday. - About 400,000 different IP addresses over about 3 hours - Mostly residential IP addresses - Valid and unique user agents and referrers - Each IP address would make only a few requests with a long delay in between requests It would hit the server hard until the server became slow to respond, then it would back off for about 30 seconds, then hit hard again. I was able to block most of the requests with a combination of user agent and referrer patterns, though some legit users may be blocked. The attack was annoying, but, the even bigger problem is that the data on this website is under license - we have to pay for it, and it's not cheap. We are able to pay for it (barely) with advertising revenue and some subscriptions. If everyone is getting this data from their "agent" and scrapers, that means no advertising revenue, and soon enough no more website to scrape, jobs lost, nowhere for scrapers to scrape for the data, nowhere for legit users to get the data for free, etc. | ||||||||
| ▲ | everdrive 2 hours ago | parent | next [-] | |||||||
Thanks for sharing the perspective here. I think a lot of folks on HN have rightly said that a lot of the problems with the modern internet are due to the ad-supported business model. I don't think you were ever going to move away from it voluntarily -- too many people support it, even if they grumble about it. But maybe (and likely for worse) LLMs will finally kill this model. | ||||||||
| ||||||||
| ▲ | ctoth an hour ago | parent | prev | next [-] | |||||||
If you don't mind me asking, what sort of data are you licensing? I noticed that you explicitly don't mention it. | ||||||||
| ▲ | shimman 2 hours ago | parent | prev | next [-] | |||||||
Do you not run Anubis or have strict fail2ban rules? I just straight up ban IPs forever if they lookup files that will never exist on my servers. That plus Anubis with the strictest settings. | ||||||||
| ||||||||
| ▲ | afinlayson 2 hours ago | parent | prev | next [-] | |||||||
At some point there needs to be a check if it's a real human... But it's a cat and mouse game - any way we create to keep bots off gets a work around by clever engineers. | ||||||||
| ▲ | wiseowise 2 hours ago | parent | prev [-] | |||||||
Don’t worry, man, once AGI is here you’ll get your allowance (or whatever the hyperscalers plan is). | ||||||||
| ||||||||