Remix clone Hacker News

new | show | ask | jobs Github

	▲	lolinder 2 days ago
		robotstxt.org [0] is pretty specific in what constitutes a robot for the purposes of robots.txt: > A robot is a program that automatically traverses the Web's hypertext structure by retrieving a document, and recursively retrieving all documents that are referenced. This is absolutely not what you are doing, which means what you have here is not a robot. What you have here is a user agent, so you don't need to pay attention to robots.txt. If what you are doing here counted as robotic traffic, then so would: * Speculative loading (algorithm guesses what you're going to load next and grabs it for you in advance for faster load times). * Reader mode (algorithm transforms the website to strip out tons of content that you don't want and present you only with the minimum set of content you wanted to read). * Terminal-based browsers (do not render images or JavaScript, thus bypassing advertising and according to some justifications leading them to be considered a robot because they bypass monetization). The fact is that the web is designed to be navigated by a diverse array of different user agents that behave differently. I'd seriously consider imposing rate limits on how frequently your browser acts so you don't knock over a server—that's just good citizenship—but robots.txt is not designed for you and if we act like it is then a lot of dominoes will fall. [0] https://www.robotstxt.org/faq/what.html