| ▲ | lovehashbrowns 3 hours ago | |
On the platform at my work they scrape the same page multiple times, over and over. They do not care to cache anything. And it’s ridiculous to account for because for example for our properties, everything is news-based so warming the cache was as simple as loading the first X articles to get them into cache. But with AI that is not viable because they scrape as much as possible, articles from 2018, 2017. Management doesn’t want to block them though. It’s just suffering through the endless barrage. I was able to do a lot for this like heavier caching even with pgpool but it’s so crazy that this small subset of bots effectively accounts for like 60%+ of our spend. | ||
| ▲ | spiderfarmer 3 hours ago | parent [-] | |
Many are using residential proxies now. It's impossible to block them. Not even Google Analytics succeeds. People are sitting on reports thinking their website is suddenly very popular, but it's all random ips, from random locations across the world requesting 1 page at a time, at random times of the day. | ||