| ▲ | krick 15 hours ago |
| So, what's up with these bots, why am I hearing about that so often lately? I mean, DDoS atacks aren't a new thing, and, honestly, this is pretty much the reason why Cloudflare even exists, but I'd expect OpenAI bots (or whatever this is now) to be a little bit easier to deal with, no? Like, simply having resonable aggressive fail2ban policy? Or do they really behave like a botnet, where each request comes from different IP from a different network? How? Why? What is this thing? |
|
| ▲ | recursivecaveat 14 hours ago | parent | next [-] |
| I doubt it's OpenAI. Maaaybe somebody who sells to OpenAI, but probably not. I think they're big enough to do this mostly in-house and properly. Before AI only big players would want a scrape of the entire internet, they could write quality bots, cooperate, behave themselves, etc. Now every 3rd tier lab wants that data and a billion startups want to sell it, so it's a wild west of bad behavior and bad implementations. They do use residential IP sets as well. |
| |
| ▲ | reppap 16 minutes ago | parent [-] | | Stop just making up excuses for these companies. Other comments on this story have showed the bots are using openai user agents and making requests from openai owned ip ranges. |
|
|
| ▲ | wseqyrku 6 hours ago | parent | prev | next [-] |
| > this is pretty much the reason why Cloudflare even exists, You said it yourself. If you're selling a cure, you might as well start a plague. |
|
| ▲ | esseph 13 hours ago | parent | prev [-] |
| The dirty secret is a lot of them come through "residential proxies", aka backdoored home routers, iot devices with shitty security, etc. Basically the scrapers who are often also third party, go to these "companies" and buy access to these "residential proxies". Some are more... considerate than others. Why? Data. Every bit of it is it might be valuable. And not to sound tin foil hatty, but we are getting closer to a post-quantum time (if we aren't already ). |
| |
| ▲ | the_biot 3 hours ago | parent | next [-] | | Has this actually been investigated and proven to be true? I see allegations, but no facts really. It seems to me to be just as likely that people are installing LLM chatbot apps that do the occasional bit of scraping work on the sly, covered by some agreed EULA. | | |
| ▲ | Symbiote 2 hours ago | parent | next [-] | | Another likely source is "free" VPN tools, or tools for streaming TV (especially football or other pay-to-view stuff). The tool can make a little money proxying requests at the same time. I can't provide evidence as it's close to impossible to separate the AI bots using residential proxies from actual users, and their IPs are considered personal data. But as the other reply shows, it's easy enough to find people selling this service. | |
| ▲ | esseph 2 hours ago | parent | prev [-] | | Seriously, go to Google. Search for: "residential proxy" ai data scraping. Start reading through thousands of articles. |
| |
| ▲ | tigerlily 9 hours ago | parent | prev [-] | | How can I detect if my router is backdoored, or being used as a residential proxy? | | |
| ▲ | mzajc 4 hours ago | parent | next [-] | | I'm dealing with such attack, so if you'd like, you can send me IPv4 addresses, and I'll grep my logs for them. Email address is on the website linked on my profile. As for what you can do on your own, it really depends on your network. OpenWRT routers can run tcpdump, so you can check for suspicious connections or DNS requests, but it gets really hard to tell if you have lots of cloud-tethered devices at home. IoT, browser extensions, and smartphone applications are the usual suspects. | |
| ▲ | kimos 9 hours ago | parent | prev [-] | | If it’s legit you can ask your ISP if they sell use of your hardware. Or just don’t use the provided hardware and instead BYO router or modem or media converter or whatever. But I think what OP is implying is insecure hardware being infected by malware and access to that hardware sold as a service to disreputable actors. For that buy a good quality router and keep it up to date. | | |
|
|