| ▲ | strogonoff 3 hours ago | ||||||||||||||||||||||||||||
You cannot block LLM crawlers by IP address, because some of them use residential proxies. Source: 1) a friend admins a slightly popular site and has decent bot detection heuristics, 2) just Google “residential proxy LLM”, they are not exactly hiding. Strip-mining original intellectual property for commercial usage is big business. | |||||||||||||||||||||||||||||
| ▲ | skrebbel 3 hours ago | parent | next [-] | ||||||||||||||||||||||||||||
How does this work? Why would people let randos use their home internet connections? I googled it but the companies selling these services are not exactly forthcoming on how they obtained their "millions of residential IP addresses". Are these botnets? Are AI companies mass-funding criminal malware companies? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
| ▲ | globalnode 2 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||
so user either has a malware proxy running requests without being noticed or voluntarily signed up as a proxy to make extra $ off their home connection. Either way I dont care if their IP is blocked. Only problem is if users behind CGNAT get their IP blocked then legitimate users may later be blocked. edit: ah yes another person above mentioned VPN's thats a good possibility, also another vector is users on mobile can sell their extra data that they dont use to 3rd parties. probably many more ways to acquire endpoints. | |||||||||||||||||||||||||||||