| ▲ | namegulf 4 hours ago | |||||||
Robots.txt is lame BTW, there is no way to enforce it. It is up to the bot to decide to crawl or not and most cases they don't care. Cloudflare had a nice technic to address the bot problem (if you use their name servers). It'll respect and use the robots.txt while sending the remaining bots to a deep black hole. | ||||||||
| ▲ | input_sh 4 hours ago | parent | next [-] | |||||||
Yes, we know, its purpose is to guide the bots, not forcibly block them. That said, one of the biggest websites in the world not respecting it is definitely a noteworthy story. Hopefully another one of the biggest websites in the world (formerly known as Twitter) eventually respects it as well instead of not even disclosing itself via a user agent and pretending to be Safari running on iOS. | ||||||||
| ||||||||
| ▲ | marginalia_nu 2 hours ago | parent | prev | next [-] | |||||||
Robots.txt is great if you're trying to run an above board operation. Much easier than trying to guess how a webmaster wishes the crawler to behave, and then getting angry emails when you guess wrong. | ||||||||
| ▲ | llbbdd 3 hours ago | parent | prev [-] | |||||||
Yeah, robots.txt is a great herald example of the type of solution invented by people who don't understand incentives whatsoever. | ||||||||