| ▲ | sodafountan 8 hours ago | |||||||
The GitHub page is no longer available, which is a shame because I'm really interested in how this works. How was the entirety of HN stored in a single SQLite database? In other words, how was the data acquired? And how does the page load instantly if there's 22GB of data having to be downloaded to the browser? | ||||||||
| ▲ | keepamovin 6 hours ago | parent [-] | |||||||
You can see it now, forgot to make it public. - 1. download_hn.sh - bash script that queries BigQuery and saves the data to *.json.gz - 2. etl-hn.js - does the sharding and ID -> shard map, plus the user stats shards. - 3. Then either npx serve docs or upload to CloudFlare Pages. The ./toool/s/predeploy-checks.sh script basically runs the entire pipeline. You can do it unattended with AUTO_RUN=true | ||||||||
| ||||||||