Remix.run Logo
cmeacham98 3 hours ago

Correct. Example snippet from the nytimes.com robots.txt:

    User-agent: archive.org_bot
    Disallow: /
joecool1029 2 hours ago | parent [-]

Which they don’t respect. I’ve had it for my blog for years and they still added it to wayback machine, see my last comment for their official announcement of the ignore robots.txt policy, it is not new.

socalgal2 15 minutes ago | parent [-]

robots.txt means they shouldn't auto-scan your site. Any user though can go to the wayback machine and type in a URL and the wayback machine will read that URL. That was the intent of robots.txt (don't scan) not (don't read period). It's spelled out in the spec for robots.txt