| ▲ | Show HN: Epstein's emails reconstructed in a message-style UI (OCR and LLMs)(github.com) | |||||||
| 44 points by toon-noot 2 days ago | 8 comments | ||||||||
This project reconstructs the Epstein email records from the recent U.S. House Oversight Committee releases using only public-domain documents (23,124 image files + 2,800 OCR text files). Most email pages contain only one real message, buried under layers of repeated headers/footers. I wanted to rebuild the conversations without all the surrounding noise. I used an OCR + vision-LLM pipeline to extract individual messages from the email screenshots, normalize senders/recipients, rebuild timestamps, detect duplicates, and map threads. The output is a structured SQLite database that runs client-side via SQL.js (WebAssembly). The repository includes the full extraction pipeline, data cleaning scripts, schema, limitations, and implementation notes. The interface is a lightweight PWA that displays the reconstructed messages in a phone-style UI, with links back to every original source image for verification. Live demo: https://epsteinsphone.org All source data is from the official public releases; no leaks or private material. Happy to answer questions about the pipeline, LLM extraction, threading logic, or the PWA implementation. | ||||||||
| ▲ | pfd1986 2 days ago | parent | next [-] | |||||||
The convo with Noam Chomsky is interesting. Deepak Chopra one talking about Trump being 'loco' is quiet funny. Neat data visualization solution! | ||||||||
| ||||||||
| ▲ | dizhn 2 days ago | parent | prev | next [-] | |||||||
Android/Firefox. Nothing's happening when I tap the icons on the demo site. | ||||||||
| ||||||||
| ▲ | pea 2 days ago | parent | prev | next [-] | |||||||
This is really cool, I enjoyed going through them in this form. Thanks | ||||||||
| ▲ | palmotea 2 days ago | parent | prev | next [-] | |||||||
One nit: the message view seems to auto-hyphenate long words on line-breaks to pack in more text, but one of the things that's struck me about Epstein is how utterly incompetent he was with punctuation. Those correctly-inserted hyphens distract from that impression. | ||||||||
| ▲ | marstall 2 days ago | parent | prev | next [-] | |||||||
brilliant. feel bad asking for something more - but an inline annotation of who these people are would take it over the top. | ||||||||
| ▲ | lights0123 a day ago | parent | prev [-] | |||||||
See also: https://jmail.world/ | ||||||||