▲ | eqvinox 6 days ago | |||||||||||||||||||
TFA — and most comments here — seem to completely miss what I thought was the main point of Anubis: it counters the crawler's "identity scattering"/sybil'ing/parallel crawling. Any access will fall into either of the following categories: - client with JS and cookies. In this case the server now has an identity to apply rate limiting to, from the cookie. Humans should never hit it, but crawlers will be slowed down immensely or ejected. Of course the identity can be rotated — at the cost of solving the puzzle again. - amnesiac (no cookies) clients with JS. Each access is now expensive. (- no JS - no access.) The point is to prevent parallel crawling and overloading the server. Crawlers can still start an arbitrary number of parallel crawls, but each one costs to start and needs to stay below some rate limit. Previously, the server would collapse under thousands of crawler requests per second. That is what Anubis is making prohibitively expensive. | ||||||||||||||||||||
▲ | qwery 6 days ago | parent | next [-] | |||||||||||||||||||
Yes, I think you're right. The commentary about its (presumed, imagined) effectiveness is very much making the assumption that it's designed to be an impenetrable wall[0] -- i.e. prevent bots from accessing the content entirely. I think TFA is generally quite good and has something of a good point about the economics of the situation, but finding the math shake out that way should, perhaps, lead one to question their starting point / assumptions[1]. In other words, who said the websites in question wanted to entirely prevent crawlers from accessing them? The answer is: no one. Web crawlers are and have been fundamental to accessing the web for decades. So why are we talking about trying to do that? [0] Mentioning 'impenetrable wall' is probably setting off alarm bells, because of course that would be a bad design. [1] (Edited to add:) I should say 'to question their assumptions more' -- like I said, the article is quite good and it does present this as confusing, at least. | ||||||||||||||||||||
| ||||||||||||||||||||
▲ | thayne 6 days ago | parent | prev | next [-] | |||||||||||||||||||
You don't necessarily need JS, you just need something that can detect if Anybis is used and complete the challenge. | ||||||||||||||||||||
| ||||||||||||||||||||
▲ | dlenski 2 days ago | parent | prev | next [-] | |||||||||||||||||||
> Crawlers can still start an arbitrary number of parallel crawls, but each one costs to start and needs to stay below some rate limit. This is a nice explanation. It's much clearer than anything I've seen offered by Anubis’s authors, in terms of why or how it could be effective at preventing a site from being ravaged by hordes of ill-behaved bots. | ||||||||||||||||||||
▲ | rocqua 6 days ago | parent | prev | next [-] | |||||||||||||||||||
This is a good point, presuming the rate limiting is actually applied. | ||||||||||||||||||||
▲ | IshKebab 6 days ago | parent | prev [-] | |||||||||||||||||||
Well maybe, but even then, how many parallel crawls are you going to do per site? 100 maybe? You can still get enough keys to do that for all sites in just a few hours per week. |