Remix.run Logo
MPSimmons 5 days ago

Is it likely that the Executive Branch would try to exert control over it to remove "inconvenient" data?

layman51 5 days ago | parent | next [-]

They already remove “inconvenient” webpages on the Wayback Machine if someone asks nicely enough. If I remember correctly, if you use it to save a software company’s documentation pages or evidence of something embarrassing like a potential data breach, they could remove it if the company asks. I think Oracle might have done something like this before.

tech234a 4 days ago | parent | next [-]

A community-maintained list collecting examples of such exclusions: https://wiki.archiveteam.org/index.php/List_of_websites_excl...

genter 4 days ago | parent | prev | next [-]

Can't say I blame them, I wouldn't want to go up against Oracle's lawyers either.

01HNNWZ0MV43FF 4 days ago | parent | prev [-]

If anyone reading knows an easy way to download and mirror IA pages please make it easier to find. A bot told me they offer downloads of the underlying WARC files but I could not find it

duskwuff 4 days ago | parent | next [-]

> A bot told me they offer downloads of the underlying WARC files but I could not find it

The "bot" is wrong. Most of the crawl data used by the Internet Archive, particularly the Alexa crawls, isn't publicly accessible. (This is because some of it includes archived pages which have since been suppressed by the site owner - removing those pages from the archived crawl data isn't practical.)

https://archive.org/details/alexacrawls

Common Crawl data is public, but less comprehensive than IA - https://commoncrawl.org/

fancy_pantser 4 days ago | parent | prev | next [-]

There are utilities to help, waybackpack comes to mind, but I haven't looked in a while. https://github.com/jsvine/waybackpack

pabs3 4 days ago | parent | prev | next [-]

I used wayback-machine-downloader, I think you need one of the forks to make it work though.

https://github.com/hartator/wayback-machine-downloader

badlibrarian 4 days ago | parent | prev [-]

They locked away most .warc files due to the AI harvesting crunch.

toomuchtodo 5 days ago | parent | prev | next [-]

It's a one way street. This provides more access to materials held by the federal gov for ingest into IA's storage system. Bit of a policy interconnect, if you will. Reminder to donate to the Archive.

themgt 4 days ago | parent | prev | next [-]

If you see a bank that says "federally chartered" or "federal deposit insurance corporation", stay clear!

jahewson 5 days ago | parent | prev | next [-]

Doubtful. They’re not part of the government so the 1st amendment applies.

chrisg23 5 days ago | parent | prev | next [-]

I've heard it has already happened. Specifically the internet archive removed vidoes of the TempleOS developer Terry Davis' live streams because of problematic content.

If the internet archive is already curated for content then yeah there is a 100% chance that there will be more curation of content.

jazzyjackson 5 days ago | parent [-]

Kiwifarms as well. They are a bit of a pushover when it comes to controversy.

jprd 5 days ago | parent [-]

I thought Archive just removed access, but kept the content. I know that from a user perspective that is a distinction without a difference, but for posterity it matters.

Does anyone have any facts/citations on if this is a myth/coping mechanism I created, or reality?

cwillu 5 days ago | parent | next [-]

“2023 The Internet Archive, a non-profit research library, makes use of internal processes and tools, including human review and hash-matching, as well as reports from external parties to identify, disable access to, and limit the reappearance of illegal and/or proscribed violent extremist material on archive.org”

https://help.archive.org/help/tco-transparency-reports/

chrisg23 4 days ago | parent [-]

I wonder how many gems like this https://archive.org/details/youtube-moXX8lbnmHs that could have been saved have been lost. (Obviously this one is saved, for now.)

This is not to disparage the tremendous work done and being done by the IA, it's more of me lamenting the trend of our society and societies to mentally babysit people lest their mind gets exposed to something bad, with the implicit assumption that adult humans can't be trusted to see some stupid bs and react with "that was some stupid bs. I am moving it into the stupid bs bucket of things I know about".

badlibrarian 4 days ago | parent | prev [-]

In the past, they stated that they do not delete anything. Those posts have vanished, possibly due to the onslaught of lawsuits and discovery. Specific to Kiwi Farms (and some other material) I was able to locate it by poking around on the site. Even the material that the Judge ruled against in the Hachette lawsuit remains online and available to people with print disabilities.

BSOhealth 5 days ago | parent | prev | next [-]

given this is already happening with many other taxpayer funded datasets, will pretty on brand with this group

odo1242 5 days ago | parent | prev | next [-]

I mean, what would they do to exert control? Remove their federal depository status?

ranger_danger 5 days ago | parent | prev [-]

Imagine having to delete their 100PB of warez.

rwmj 5 days ago | parent [-]

Wait til you hear about my local library. You can walk in and read or borrow any book without paying!

1659447091 4 days ago | parent | next [-]

Sounds outdated! My library doesnt even require me to walk in anymore, they send any book I want to read or listen to straight to my phone, and if they don't have it I can request they acquire it and send it to me for free

natas 5 days ago | parent | prev | next [-]

I wish my public library was free...

GeorgeTirebiter 5 days ago | parent | prev | next [-]

Sounds VERY Communist, or Socialist, or some other scary thing. Are you sure it's legal? Why, the AUTHORS and PUBLISHERS are being denied the revenues they would get if you would buy the book; or at least rent it. So, are libraries theft of Authors' and Publishers' renumeration? (And, to think, the richest man in the world at the time, Andrew Carnegie, endowed so many Libraries!)

nope577 4 days ago | parent [-]

Your shift key seems to keep getting stuck.

NoMoreNicksLeft 4 days ago | parent | prev [-]

Wait until you hear about my private library that resides on a Synology NAS. I can access it from anywhere in the world, on any device, and it's filled with whatever books I can bother to decide that I want that title. I have about 20,000 (not counting periodicals) all carefully curated and retail quality. I even got rid of those annoying generic Bantam Press covers and replaced them with the high-res stuff off the publisher's site.

Not sure what the appeal of the public library is, when you can have your own.

thunderfork 4 days ago | parent [-]

[dead]