▲ | psychoslave 5 days ago | |||||||
Why are there even doing so? This doesn’t feel like something that can even bring any value downstream to their own selfish pipelines, or am I missing something? | ||||||||
▲ | kstrauser 5 days ago | parent | next [-] | |||||||
No! They’re constantly hitting the same stupid URL (“show me this file in this commit in this repo with these 47 query params”) from a few thousand IPs in China and Brazil, with user agents showing an iPod or a Linux desktop running Opera 3. I wrote a little script where I throw in an IP and it generates a Caddy IP-matcher block with an “abort” rule for every netblock in that IP’s ASN. I’m sure there are more elegant ways to share my work with the world while blocking the scoundrels, but this is kind of satisfying for the moment. | ||||||||
▲ | danaris 5 days ago | parent | prev [-] | |||||||
Best I can figure, they've decided that it's easier to set up their scrapers to simply scrape absolutely everything, all the time, forever than to more carefully select what's worth it to get. Various LLM-training scrapers were absolutely crippling my tiny (~125 weekly unique users) browser game until I put its Wiki behind a login wall. There is no possible way they could see any meaningful return from doing so. | ||||||||
|