Remix.run Logo
hinkley a day ago

If it takes them 100 times the average crawl time to crawl my site, that is an opportunity cost to them. Of course 'time' is fuzzy here because it depends how they're batching. The way most bots work is to pull a fixed number of replies in parallel per target, so if you double your response time then you halve the number of request per hour they slam you with. That definitely affects your cluster size.

However if they split ask and answered, or other threads for other sites can use the same CPUs while you're dragging your feet returning a reply, then as you say, just IO delays won't slow them down. You've got to use their CPU time as well. That won't be accomplished by IO stalls on your end, but could potentially be done by adding some highly compressible gibberish on the sending side so that you create more work without proportionately increasing your bandwidth bill. But that's could be tough to do without increasing your CPU bill.

dspillett 14 hours ago | parent [-]

> If it takes them 100 times the average crawl time to crawl my site, that is an opportunity cost to them.

If it takes 100 times the average crawl time per page on your site, which is one of many tens (hundreds?) of thousand sites, many of which may be bigger, unless they are doing one site at a time, so your site causes a full queue stall, such efforts likely amount to no more than statistical noise.

hinkley 9 hours ago | parent [-]

Again, that delay is mostly about me, and my employer, not the rest of the world.

However if you are running a SaaS or hosting service with thousands of domain names routing to your servers, then this dynamic becomes a little more important, because now the spider can be hitting you for fifty different domain names at the same time.