Remix.run Logo
typpilol 8 hours ago

You're telling me they have no redundant copies at all in another location?

I could see that due to the sheer size but im sure they have a robust disk pool that would take a lot for it to lose data

cookiengineer 6 hours ago | parent [-]

In my opinion redundancy in a single business entity is no redundancy at all, especially if there's legal obligations of a soon-to-be-burning-books-again regime.

A better strategy would have been to found independent entities in other liberal democracies, so they can act as IP backups.

There was a great vpro documentary called "Digital Amnesia" [1] where they also interviewed the lead of the library of Alexandria, who was the only bidder to buy the national KIT library of the Netherlands and its dissolved inventory at the time.

Interviews with archivists, librarians, web archive and others on the topic. It's insane to see that nations don't want to preserve their history, science, and culture anymore.

But here we are.

[1] https://youtube.com/watch?v=NdZxI3nFVJs

typpilol 6 hours ago | parent [-]

Do we know how much data the internet archive has?

Is it even viable to replicate to multiple regions if it's 1000s of PB?

cookiengineer 6 hours ago | parent [-]

Anna's archive has the metadata on it.

IA was around 300TB last time I checked.

libgen was around 190TB. For my own at home cluster I decided to go for 512TB but I can't host nor upload in these bandwidth requirements from here.

I started to build sth like a torrent splitter tool yesterday because I realized that all torrent clients just crash when you try to open, modify, or seed those torrents.

Edit: correction, the IA is ~15PB big, brewster kahle mentioned it in the documentary (2014)