Remix.run Logo
socalgal2 an hour ago

robots.txt means they shouldn't auto-scan your site. Any user though can go to the wayback machine and type in a URL and the wayback machine will read that URL. That was the intent of robots.txt (don't scan) not (don't read period). It's spelled out in the spec for robots.txt

keane 14 minutes ago | parent [-]

The <meta name="robots"> tag and robots.txt serve different roles: robots.txt controls crawling, while the robots meta tag influences indexing and other behavior. https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...

I wonder how archive.org_bot behaves when <meta name="robots" content="noindex, noarchive, nocache" /> is present.