They already remove “inconvenient” webpages on the Wayback Machine if someone asks nicely enough. If I remember correctly, if you use it to save a software company’s documentation pages or evidence of something embarrassing like a potential data breach, they could remove it if the company asks. I think Oracle might have done something like this before.

▲

tech234a 4 days ago | parent | next [-]

A community-maintained list collecting examples of such exclusions: https://wiki.archiveteam.org/index.php/List_of_websites_excl...

▲

genter 4 days ago | parent | prev | next [-]

Can't say I blame them, I wouldn't want to go up against Oracle's lawyers either.

▲

01HNNWZ0MV43FF 4 days ago | parent | prev [-]

If anyone reading knows an easy way to download and mirror IA pages please make it easier to find. A bot told me they offer downloads of the underlying WARC files but I could not find it

	▲	duskwuff 4 days ago \| parent \| next [-]
		> A bot told me they offer downloads of the underlying WARC files but I could not find it The "bot" is wrong. Most of the crawl data used by the Internet Archive, particularly the Alexa crawls, isn't publicly accessible. (This is because some of it includes archived pages which have since been suppressed by the site owner - removing those pages from the archived crawl data isn't practical.) https://archive.org/details/alexacrawls Common Crawl data is public, but less comprehensive than IA - https://commoncrawl.org/
	▲	fancy_pantser 4 days ago \| parent \| prev \| next [-]
		There are utilities to help, waybackpack comes to mind, but I haven't looked in a while. https://github.com/jsvine/waybackpack
	▲	pabs3 4 days ago \| parent \| prev \| next [-]
		I used wayback-machine-downloader, I think you need one of the forks to make it work though. https://github.com/hartator/wayback-machine-downloader
	▲	badlibrarian 4 days ago \| parent \| prev [-]
		They locked away most .warc files due to the AI harvesting crunch.