I am not well versed in this problem but can't the web servers rate limit by known IP addresses of these crawler/scrapers?

▲

adobrawy 2 minutes ago | parent | next [-]

They rely on residential proxies powered by botnets — often built by compromising IoT devices (see: https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-fro... ). In other words, many AI startups — along with the corporations and VC funds backing them — are indirectly financing criminal botnets.

▲

Yoric 12 minutes ago | parent | prev | next [-]

Not the exact same problem, but a few months ago, I tried to block youtube traffic from my home (I was writing a parental app for my child) by IP. After a few hours of trying to collect IPs, I gave up, realizing that YouTube was dynamically load-balanced across millions of IPs, some of which also served traffic from other Google services I didn't want to block.

I wouldn't be surprised if it was the same with LLMs. Millions of workers allocated dynamically on AWS, with varying IPs.

In my specific case, as I was dealing with browser-initiated traffic, I wrote a Firefox add-on instead. No such shortcut for web servers, though.

	▲	bonsai_spool 3 minutes ago \| parent [-]
		Why not have local DNS at your router and do a block there? It can even be per-client with adguardhome

▲

strogonoff an hour ago | parent | prev | next [-]

You cannot block LLM crawlers by IP address, because some of them use residential proxies. Source: 1) a friend admins a slightly popular site and has decent bot detection heuristics, 2) just Google “residential proxy LLM”, they are not exactly hiding. Strip-mining original intellectual property for commercial usage is big business.

▲

skrebbel an hour ago | parent | next [-]

How does this work? Why would people let randos use their home internet connections? I googled it but the companies selling these services are not exactly forthcoming on how they obtained their "millions of residential IP addresses".

Are these botnets? Are AI companies mass-funding criminal malware companies?

▲

fakwandi_priv 16 minutes ago | parent | next [-]

It used to be Hola VPN which would let you use someone else’s connection and in the same way someone could use yours which was communicated transparently, that same hola client would also route business users. Im sure many other free VPN clients do the same thing nowadays.

▲

joha4270 39 minutes ago | parent | prev | next [-]

I have seen it claimed that's a way of monetizing free phone apps. Just bundle a proxy and get paid for that.

	▲	cuu508 20 minutes ago \| parent [-]
		A recent HN thread about this: https://news.ycombinator.com/item?id=45746156

▲

stackghost an hour ago | parent | prev [-]

>Are these botnets? Are AI companies mass-funding criminal malware companies?

Without a doubt some of them are botnets. AI companies got their initial foothold by violating copyright en masse with pirated textbook dumps for training data, and whatnot. Why should they suddenly develop scruples now?

▲

globalnode 41 minutes ago | parent | prev [-]

so user either has a malware proxy running requests without being noticed or voluntarily signed up as a proxy to make extra $ off their home connection. Either way I dont care if their IP is blocked. Only problem is if users behind CGNAT get their IP blocked then legitimate users may later be blocked.

▲

ninja3925 2 hours ago | parent | prev [-]

Large cloud providers could offer that solution but then, crawlers can also change cycle IPs