Remix.run Logo
progbits 11 hours ago

The Sanderson wiki [1] has a time-travel feature where you read a snapshot just before a publication of a book, ensuring no spoilers.

I would like a similar pre-LLM Wikipedia snapshot. Sometimes I would prefer potentially stale or incomplete info rather than have to wade through slop.

1: https://coppermind.net/wiki/Coppermind:Welcome

csande17 10 hours ago | parent | next [-]

The easiest way to get this is probably Kiwix. You can download a ~100GB file containing all of English Wikipedia as of a particular date, then browse it locally offline.

I'm not sure if it's real or not, but the Internet Archive has a listing claiming to be the dump from May 2022: https://archive.org/details/wikipedia_en_all_maxi_2022-05

embedding-shape 10 hours ago | parent | next [-]

Alternatively, straight from Wikimedia, those are the dumps I'm using, trivial to parse concurrently and easy format to parse too, multistream-xml in bz2. Latest dump (text only) is from 2026-01-01 and weights 24.1 GB. https://dumps.wikimedia.org/enwiki/20260101/ Also have splits together with indexes, so you can grab few sections you want, if 24GB is too large.

JKCalhoun 7 hours ago | parent | prev [-]

There's a torrent at the linked URL. Trying that right now. (I have a couple of Kiwix dumps of Wikipedia offline already.)

Antibabelic 11 hours ago | parent | prev | next [-]

But you can already view the past version of any page on Wikipedia. Go to the page you want to read, click "View history" and select any revision before 2023.

progbits 11 hours ago | parent [-]

I know but it's not as convenient if you have to keep scrolling through revisions.

kace91 10 hours ago | parent | prev [-]

Have you personally encountered slop there? I tend to use Wikipedia rabbit holes as a pastime and haven’t really felt a difference.