| ▲ | cwbriscoe 2 hours ago | |||||||||||||||||||||||||||||||||||||||||||
I am not well versed in this problem but can't the web servers rate limit by known IP addresses of these crawler/scrapers? | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | adobrawy 2 minutes ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||
They rely on residential proxies powered by botnets — often built by compromising IoT devices (see: https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-fro... ). In other words, many AI startups — along with the corporations and VC funds backing them — are indirectly financing criminal botnets. | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Yoric 12 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Not the exact same problem, but a few months ago, I tried to block youtube traffic from my home (I was writing a parental app for my child) by IP. After a few hours of trying to collect IPs, I gave up, realizing that YouTube was dynamically load-balanced across millions of IPs, some of which also served traffic from other Google services I didn't want to block. I wouldn't be surprised if it was the same with LLMs. Millions of workers allocated dynamically on AWS, with varying IPs. In my specific case, as I was dealing with browser-initiated traffic, I wrote a Firefox add-on instead. No such shortcut for web servers, though. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | strogonoff an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
You cannot block LLM crawlers by IP address, because some of them use residential proxies. Source: 1) a friend admins a slightly popular site and has decent bot detection heuristics, 2) just Google “residential proxy LLM”, they are not exactly hiding. Strip-mining original intellectual property for commercial usage is big business. | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | ninja3925 2 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||
Large cloud providers could offer that solution but then, crawlers can also change cycle IPs | ||||||||||||||||||||||||||||||||||||||||||||