| ▲ | csande17 10 hours ago | |
The easiest way to get this is probably Kiwix. You can download a ~100GB file containing all of English Wikipedia as of a particular date, then browse it locally offline. I'm not sure if it's real or not, but the Internet Archive has a listing claiming to be the dump from May 2022: https://archive.org/details/wikipedia_en_all_maxi_2022-05 | ||
| ▲ | embedding-shape 10 hours ago | parent | next [-] | |
Alternatively, straight from Wikimedia, those are the dumps I'm using, trivial to parse concurrently and easy format to parse too, multistream-xml in bz2. Latest dump (text only) is from 2026-01-01 and weights 24.1 GB. https://dumps.wikimedia.org/enwiki/20260101/ Also have splits together with indexes, so you can grab few sections you want, if 24GB is too large. | ||
| ▲ | JKCalhoun 7 hours ago | parent | prev [-] | |
There's a torrent at the linked URL. Trying that right now. (I have a couple of Kiwix dumps of Wikipedia offline already.) | ||