The GitHub page is no longer available, which is a shame because I'm really interested in how this works.

How was the entirety of HN stored in a single SQLite database? In other words, how was the data acquired? And how does the page load instantly if there's 22GB of data having to be downloaded to the browser?

▲

keepamovin 6 hours ago | parent [-]

You can see it now, forgot to make it public.

- 1. download_hn.sh - bash script that queries BigQuery and saves the data to *.json.gz

- 2. etl-hn.js - does the sharding and ID -> shard map, plus the user stats shards.

- 3. Then either npx serve docs or upload to CloudFlare Pages.

The ./toool/s/predeploy-checks.sh script basically runs the entire pipeline. You can do it unattended with AUTO_RUN=true

	▲	sodafountan 4 hours ago \| parent [-]
		Awesome, I'll take a look