Good to know that it's working.

One solution option left out: contact the hostmaster.

angelhadjiev 2 days ago | parent | next [-]

Fair point. Direct outreach works when you can identify who to contact and they’re responsive. In practice though, most data teams are scraping hundreds of domains, not one. The hostmaster path doesn’t scale, and tarpits often get deployed at the CDN/WAF layer (Cloudflare, Vercel) where there’s no meaningful human on the other end anyway.

Curious to know have you had success with that approach at scale, or more for one-off access agreements?

▲

paulnpace a day ago | parent | prev [-]

It looks like your handle is trolled, because your comments don't appear flagworthy, to me.

> The hostmaster path doesn’t scale

This IS the issue - destroying servers because it's inconvenient to coordinate with the administrators. Victory on the scraper end is temporary when disrespecting the people paying for the resources, especially since a lot of those resources have been made available by developers who become emotionally motivated to curtail the efforts of the scrapers.

> tarpits often get deployed at the CDN/WAF layer (Cloudflare, Vercel)

Cloudflare and others usually have exception options.

> Curious to know have you had success with that approach at scale, or more for one-off access agreements?

I'm tiny and only run little personal stuff. I just block vast IP address blocks. For example, blocking DO nearly eliminated all of the worst slop being sent to my servers. Similarly, I stopped serving on IPv6. I've read what other administrators are doing, and apparently there is something relatively easy to implement on Apache that blocks a lot of scrapers because DokuWiki was having scraper problems that were fixed by this method.

	▲	angelhadjiev a day ago \| parent [-]
		;] Not a bot - just sleep-deprived. Spent last night chasing a tarpit at 2am. You're right that scraping has a bad reputation (still, although it's one of the top topics on google words), and some of it is well-deserved. The moral framing is fair in the training-crawler context, but the article's point is about collateral damage to legitimate use cases. Price comparison, research, public data pipelines... these aren't the bad actors, they just look like them. That's the gap worth closing in my opinion.