| ▲ | paulnpace a day ago | |
It looks like your handle is trolled, because your comments don't appear flagworthy, to me. > The hostmaster path doesn’t scale This IS the issue - destroying servers because it's inconvenient to coordinate with the administrators. Victory on the scraper end is temporary when disrespecting the people paying for the resources, especially since a lot of those resources have been made available by developers who become emotionally motivated to curtail the efforts of the scrapers. > tarpits often get deployed at the CDN/WAF layer (Cloudflare, Vercel) Cloudflare and others usually have exception options. > Curious to know have you had success with that approach at scale, or more for one-off access agreements? I'm tiny and only run little personal stuff. I just block vast IP address blocks. For example, blocking DO nearly eliminated all of the worst slop being sent to my servers. Similarly, I stopped serving on IPv6. I've read what other administrators are doing, and apparently there is something relatively easy to implement on Apache that blocks a lot of scrapers because DokuWiki was having scraper problems that were fixed by this method. | ||
| ▲ | angelhadjiev a day ago | parent [-] | |
;] Not a bot - just sleep-deprived. Spent last night chasing a tarpit at 2am. You're right that scraping has a bad reputation (still, although it's one of the top topics on google words), and some of it is well-deserved. The moral framing is fair in the training-crawler context, but the article's point is about collateral damage to legitimate use cases. Price comparison, research, public data pipelines... these aren't the bad actors, they just look like them. That's the gap worth closing in my opinion. | ||