Remix.run Logo
adsharma 5 hours ago

So this thing is based on Kiwix, which is based on the ZIM file format.

In the meanwhile, wikipedia ships wikidata, which uses RDF dumps (and probably 8x less compressed than it should be).

https://www.wikidata.org/wiki/Wikidata:Database_download

There is room for a third option leveraging commercial columnar database research.

https://adsharma.github.io/duckdb-wikidata-compression/

jrm4 2 hours ago | parent [-]

And for those who are only vaguely familiar, this ZIM file format is not the same as the https://zim-wiki.org one.

hofrogs 2 hours ago | parent [-]

I am actually only vaguely familiar and I was wondering about that every time I saw the format referenced but never bothered to check, your comment is informative!