▲ | kerkeslager 2 days ago | |||||||
Question: do these bots not respect robots.txt? I haven't added these scrapers to my robots.txt on the sites I work on yet because I haven't seen any problems. I would run something like this on my own websites, but I can't see selling my clients on running this on their websites. The websites I run generally have a honeypot page which is linked in the headers and disallowed to everyone in the robots.txt, and if an IP visits that page, they get added to a blocklist which simply drops their connections without response for 24 hours. | ||||||||
▲ | 0xf00ff00f 2 days ago | parent | next [-] | |||||||
> The websites I run generally have a honeypot page which is linked in the headers and disallowed to everyone in the robots.txt, and if an IP visits that page, they get added to a blocklist which simply drops their connections without response for 24 hours. I love this idea! | ||||||||
| ||||||||
▲ | jonatron 2 days ago | parent | prev | next [-] | |||||||
You haven't seen any problems because you created a solution to the problem! | ||||||||
▲ | throw_m239339 2 days ago | parent | prev [-] | |||||||
> Question: do these bots not respect robots.txt? No they don't, because there is no potential legal liability for not respecting that file in most countries. |