| ▲ | Bender 2 hours ago | |||||||||||||||||||||||||
I tested this theory not long ago and did not see anything that aligned with the hype around bots. [1] There are indeed more bots than humans because of course there are or at least the appearance of. Bots crawl everything linked from popular sites whereas humans only click on things that interest them and even then they do not typically siphon the entire site. There are new bot operators every day due to curiosity and FOMO. The only thing I saw that could possibly be construed as abusive were some poorly configured RSS bots. Even when my server told the bot that the page would not change for 4 hours the RSS bots would check every 10 minutes meaning they are ignoring the cache-control header. This was entirely harmless, just slightly annoying. The RSS bots are not new. Most of the bots are not even trying to disguise themselves as humans. Most bots are not programmed to parse cache-controls, rel tags or fetch robots.txt meaning they only follow the pirate code. A bot will do what a bot can do. I was expecting the bots to mirror a couple git repositories I exposed but they did not go deeper than the README.md. None of them. I think this is the same pattern of catastrophization that exists around AI dooming the world and I don't know why it is spreading. I guess it must work or people would not do it. [1] - https://blawg.nochan.net/b/Internet-Crap/20260522-Maybe-AI-B... | ||||||||||||||||||||||||||
| ▲ | Symbiote an hour ago | parent [-] | |||||||||||||||||||||||||
My employer's site was recording 1,500 requests per second from a single AI bot earlier this week. The requests came from 2.4 million different IPs at the time I looked, between 1-2 requests from each IP, most likely all were unique URLs. That single bot was 55% of traffic. This kind of crawling pushes us to (sometimes beyond) the limit of our capacity. I have also seen thousands of requests per hour from the IP to a small set of pages, e.g. the homepage. I don't know why; it doesn't matter so I ignore it. I've recently found there are websites offering curated "AI ready" datasets, and several of these sites claim to have indexed our site, on the 3-4 I looked at it was one of a few hundred datasets. It's interesting enough to be something an AI company would want, so my conclusion is the site is being specifically targeted by the AI bot developers. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||