Remix.run Logo
SilverElfin 3 days ago

Is there anyone archiving all of reddit? Or twitter? I mean even if their terms have changed to not allow it.

DaSHacka 3 days ago | parent | next [-]

> reddit

There used to be one such project (Pushshift), before the Reddit API change. You can download all the data and see all the info on the-eye, another datahoarder/preservationist group:

https://the-eye.eu/redarcs/

> twitter

Not that I know of, and you haven't even been able to archive tweets on the Wayback machine for YEARS.

stuffoverflow 3 days ago | parent | prev | next [-]

Academictorrents has monthly dumps of all reddit submissions and comments even after the API restrictions.

pabs3 3 days ago | parent [-]

https://academictorrents.com/browse.php?search=stuck_in_the_...

SilverElfin 3 days ago | parent [-]

Interesting. You don’t have to be an academic to access these I guess?

mkl 2 days ago | parent [-]

They have magnet links and torrent files right there on the pages, so no.

pabs3 3 days ago | parent | prev | next [-]

ArchiveTeam was doing that, but their stuff no longer works due to changes at Reddit. The wiki page about it links to some other groups doing Reddit archiving.

https://wiki.archiveteam.org/index.php/Reddit

Seattle3503 3 days ago | parent | prev | next [-]

ArcticShift is a project with that goal. It picks up where PushShift left off when the API changes killed that project.

https://github.com/ArthurHeitmann/arctic_shift

pabs3 3 days ago | parent | next [-]

Viewer and stats for ArcticShift: https://photon-reddit.com/ https://arctic-shift.photon-reddit.com/

SilverElfin 3 days ago | parent | prev [-]

Thanks. I wonder if anyone does this for hacker news.

mdaniel 2 days ago | parent [-]

I believe there is a dataset in BigQuery but I haven't tried looking at it in order to know how uptodate it is <https://news.ycombinator.com/item?id=10440502>

Given that Firebase (which powers the API link at the bottom of this page) is a Google property, I cannot possibly imagine why they'd differ

9dev 3 days ago | parent | prev [-]

Ask OpenAI maybe?