▲ | qualeed 2 days ago | ||||||||||||||||
There's no reason not to respect it. If your browser behaves, it's not going to be excluded in robots.txt. If your browser doesn't behave, you should at least respect robots.txt. If your browser doesn't behave, and you continue to ignore robots.txt, that's just... shitty. | |||||||||||||||||
▲ | lolinder 2 days ago | parent [-] | ||||||||||||||||
> If your browser behaves, it's not going to be excluded in robots.txt. No, it's common practice to allow Googlebot and deny all other crawlers by default [0]. This is within their rights when it comes to true scrapers, but it's part of why I'm very uncomfortable with the idea of applying robots.txt to what are clearly user agents. It sets a precedent where it's not inconceivable that we have websites curating allowlists of user agents like they already do for scrapers, which would be very bad for the web. [0] As just one example: https://www.404media.co/google-is-the-only-search-engine-tha... | |||||||||||||||||
|