| ▲ | Show HN: OSS AI agent that indexes and searches the Epstein files(epstein.trynia.ai) | |||||||||||||||||||||||||||||||||||||
| 59 points by jellyotsiro 6 hours ago | 16 comments | ||||||||||||||||||||||||||||||||||||||
Hi HN, I built an open-source AI agent that has already indexed and can search the entire Epstein files, roughly 100M words of publicly released documents. The goal was simple: make a large, messy corpus of PDFs and text files immediately searchable in a precise way, without relying on keyword search or bloated prompts. What it does: - The full dataset is already indexed - You can ask natural language questions - Answers are grounded and include direct references to source documents - Supports both exact text lookup and semantic search Discussion around these files is often fragmented. This makes it possible to explore the primary sources directly and verify claims without manually digging through thousands of pages. Happy to answer questions or go into technical details. | ||||||||||||||||||||||||||||||||||||||
| ▲ | andy_ppp 32 minutes ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
I keep thinking that the lack of children’s faces in the blacked out rectangles make the files much less shocking. I wonder if AI could put back fake images to make clearer to people how sick all this is. | ||||||||||||||||||||||||||||||||||||||
| ▲ | iowemoretohim 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Those are going to be some spicy hallucinations. | ||||||||||||||||||||||||||||||||||||||
| ▲ | wutsthat4 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
And what did you learn? | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | nubg 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Does this work with vector embeddings? | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | tehjoker 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
This is a good idea. One thing I never understand about these kinds of projects though: why are the standard questions provided to the user as prompts never cached? | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | dfxm12 3 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
can search the entire Epstein files It's worth noting that only about 1% of the files have been released, according to the DOJ. Of the released files, many have redactions. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||