Use proof-of-work captchas, many are private by default. Look into Private Captcha or Cap captcha.

mootothemax 5 minutes ago | parent | next [-]

Speaking from the scraper’s perspective, I like proof of work; a ten year old 96-core server will cost a couple of quid to run for a few hours and will grab an absurd number of pages thanks to the access granted by repeatedly solving proofs of work. Small slick codebases too!

▲

phoronixrly 2 hours ago | parent | prev [-]

How does proof of work stop bots?

▲

stephantul 2 hours ago | parent | next [-]

Because it destroys the economics of scraping. It’s too expensive with proof of work, or at least not as economically viable

▲

gruez 2 hours ago | parent [-]

Depends on what type of scraping you're trying to stop. For the dumb scrapers that would try to scrape every page on a git forge (for which there are a bazillion pages for a modest project, because of how the site works), yeah it might deter them enough to stop. For anything high value (eg. reddit comments or retail prices), 10s of cpu time isn't going to stop them.

▲

stephantul 35 minutes ago | parent | next [-]

Sure, the whole premise is exactly that proof of work reduces the value of scraping, while having negligible impact on users. If the data is so valuable that bot operators are willing to pay 10s of cpu, then other measures are necessary.

Nevertheless even for these high value cases, you can still argue that it disincentivizes the business model, it becomes less efficient.

▲

pmontra 2 hours ago | parent | prev [-]

It will not scare away bots but 10 seconds of wait (CPU or only a sleep) will turn away many real users. "This site is so slow, I'll use something else." A kind of reverse captcha.

▲

Hnrobert42 an hour ago | parent | next [-]

Maybe, the proof of work can run in the background.

	▲	btown 10 minutes ago \| parent [-]
		Or it can run as part of a checkout wizard's "verifying your browser and processing your payment, don't close your tab" step.

▲

2 hours ago | parent | prev [-]

[deleted]

▲

ray_v 2 hours ago | parent | prev [-]

If it gets too expensive/time-consuming to scrape then it won't happen at scale (as much)?