| ▲ | lukeigel a day ago |
| Thanks! And it's a lot of info, yeah. ~90% of new data in yesterday's drop was photographs, which they redacted for us. The House Oversight Committee's giant drop in November had tons of data we still didn't take advantage of even after doing the original Jmail, like flight logs. For the Yahoo release, which is still ongoing, the folks at Drop Site News (see https://www.jmail.world/about) are handling the manual redaction which has been very time consuming, even with tons of AI to help in the background. |
|
| ▲ | dvrp a day ago | parent | next [-] |
| Would be nice to explain at some point how we did the structuring of the destructured data. For now we’re focusing on fixing the bugs because we’re already seeing an insane wave of traffic so most of us are focused on keeping the site alive. |
| |
| ▲ | nsomaru 13 hours ago | parent [-] | | Hey, I’d be interested in your thoughts on this, or the key ideas/research results you relied on: | | |
| ▲ | lukeigel 8 hours ago | parent [-] | | Yes! We used our friends at Reducto (https://reducto.ai/) for all document extraction and parsing (one of the best companies I've ever referred to YC ;) ) We did an initial parsing pass of all four DOJ document batches on Friday. This takes a raw PDF and returns chunks containing typed blocks—each with a type (Title, Text, Figure, etc.), bounding boxes, content, and confidence scores. For PDFs that were just scans of photographs (which was like 90% of new content in Friday's release), it gave in depth descriptions of those! You can type search terms like "door" at https://www.jmail.world/photos to see what I mean. For apps like Jmail and JFlights we use their structured extraction endpoint instead—you define a schema (e.g. {from, to, subject, date, body} for emails or {departure_airport, arrival_airport, passengers[], date} for flights) and it pulls those fields directly into JSON. The JFlights example served as the best ad for Reducto and how doc parsing technology can speed up hours of journalistic investigations like this. See for yourself. Given this document https://www.jmail.world/drive/HOUSE_OVERSIGHT_002031 It inferred and enriched multiple flight cards on JFlights (https://www.jmail.world/flights). I was really shook when I first saw this. | | |
| ▲ | adit_a 7 hours ago | parent [-] | | This might be our coolest case study yet. Thanks for the mention! |
|
|
|
|
| ▲ | defrost a day ago | parent | prev | next [-] |
| One interesting thread to pull is "Stuff released and then Yanked back" ... Images removed from Epstein files less than a day after being posted - https://www.abc.net.au/news/2025-12-21/images-removed-from-e... promises all the sleuthing excitement of chasing the significance of Donald in a Drawer. |
| |
| ▲ | wahnfrieden a day ago | parent | next [-] | | Images were also planted to falsely suggest incriminating evidence. | | |
| ▲ | bryanrasmussen a day ago | parent | next [-] | | while true, it would probably be useful to provide examples. The one that I am aware of seems to be a picture showing Clinton, Michael Jackson, and Diana Ross with "redacted" victims https://www.imdb.com/news/ni65628031/ https://bsky.app/profile/meidastouch.com/post/3mag7myutmc2d however it seems that this photo is actually taken from a 2003 Democratic fundraiser, and the redacted images of victims were of Diana Ross' son Evan, and Michael Jackson's kids, Paris and Prince Jackson. This may or may not be accurate either, since I have not been able to dig down into the photo and determine if it has any connections to a supposed 2003 fundraiser. But it seems more likely to be true than not that this was sloppily planted evidence that was especially insultingly fake. on edit: looking closer does not seem to be exact same photos, but instead two different photos taken at the same time and place, so in the 2003 Dem fundraising, but a different photo of that. So it could be that Epstein had it and DOJ thought hey, look at these pervs! Let's release!! | | |
| ▲ | pohl 11 hours ago | parent | next [-] | | Is it possible that one is an input photo and the other is generative AI output? | |
| ▲ | Arn_Thor 19 hours ago | parent | prev [-] | | As you say, it's not the same photo. If the one in the dump was in Epstein's possession, the reason for the redactions are either that some drone in the DOJ just redacted all children out of habit, or that it was deliberately done in such a way as to frame Clinton. I can't decide which I find more credible. | | |
| ▲ | bryanrasmussen 19 hours ago | parent | next [-] | | I think if it hadn't been those adults with the kids an alert staffer might have thought "whose kids are these, these aren't young teenage girls, I better double check" But Michael Jackson, kids, Clinton arms around him, Diana Ross with young male, they're thinking they walked into an armory filled with nothing but smoking guns! | |
| ▲ | gruez 12 hours ago | parent | prev | next [-] | | >the reason for the redactions are either that some drone in the DOJ just redacted all children out of habit, or that it was deliberately done in such a way as to frame Clinton They were supposed to redact all minors, not just "victims". | |
| ▲ | dontlaugh 14 hours ago | parent | prev [-] | | There’s no need to frame Clinton, there is plenty of evidence he was friends with and spent a lot of time with Epstein. Similarly situation with Trump, for that matter. | | |
| ▲ | brookst 11 hours ago | parent [-] | | It is perfectly possible, even common, to frame the guilty. It’s easier than finding real evidence. | | |
|
|
| |
| ▲ | wahnfrieden a day ago | parent | prev [-] | | I see people are not clued into this and incredulously downvote because the file release appears to be in good faith to them such that illegal evidence tampering is out of the question See https://news.ycombinator.com/item?id=46341688 | | |
| |
| ▲ | gazabbqparty 15 hours ago | parent | prev [-] | | [flagged] |
|
|
| ▲ | genghisjahn 12 hours ago | parent | prev | next [-] |
| But, whoever’s doing the redacting sees the original right? What prevents the redactor from saying, “here’s what the document really said.” Or “here’s who’s in the image, I saw it before I redacted it?” |
| |
| ▲ | freedomben 12 hours ago | parent | next [-] | | The idea of spending the rest of their life in prison is what stops them | | |
| ▲ | helterskelter 9 hours ago | parent [-] | | Yeah but a few words from somebody like Ghislaine could completely fuck shit up for a lot of people. Of course, she'll have hanged herself shortly afterward while the security cameras were malfunctioning. |
| |
| ▲ | sigwinch 12 hours ago | parent | prev | next [-] | | Part of the law mandates that all redactions will be listed for Congress within 15 days. | |
| ▲ | mcintyre1994 12 hours ago | parent | prev | next [-] | | I’d guess a first pass is done automatically? Eg if a page mentions eg Trump, just redact that whole page/paragraph/etc. So the people who have done the closer reading to redact further probably don’t actually know the scale of what was already redacted. Just a guess though. | |
| ▲ | immibis 11 hours ago | parent | prev | next [-] | | People who they think will do this don't get to be redactors. It's all about power and relationships, not technology. | |
| ▲ | exe34 6 hours ago | parent | prev | next [-] | | Given how MTG went completely silent despite her high profile platform, I'm guessing the civil (or at this point, royal) servants don't want their families harmed. | |
| ▲ | chiefalchemist 12 hours ago | parent | prev [-] | | That’s a good point. I would imagine they break it up into pieces - in a reCAPTCHA sorta way - and any given person sees a sentence or a piece of a sentence. An alternative would be to strip out all obvious known words and only leave unknowns (i.e., names) and then have those fragments reviewed (in a reCAPTCHA sorta way). Finally, for images, cover all faces and the one by one decide which should remain covered and which should not. LOTS of work but there are workflows to mitigate the ability for reviewers to connect more than they should. |
|
|
| ▲ | alex1138 a day ago | parent | prev [-] |
| I'm being snarky and this isn't such a serious comment and I don't really mean this for Gemini but can you imagine using something like Gemini ("Hi, please comb through this") and it just refuses on ethical grounds |
| |
| ▲ | lukeigel 18 hours ago | parent [-] | | We found that Codex indeed refuses but Claude + Gemini are willing to RAG it | | |
| ▲ | lukeigel 8 hours ago | parent | next [-] | | also, shoutout the Jason Liu (https://news.ycombinator.com/user?id=jxnlco) for discovering that one. His turbopuffer-based version of Jemini is coming soon! | |
| ▲ | muzani 14 hours ago | parent | prev [-] | | Usually Claude is the prude. Personally I haven't even tried for fear what I'd find. I can stomach homicide and war pictures, but Epstein is too much. | | |
| ▲ | alex1138 12 hours ago | parent [-] | | I just have real institutional problems with Google, they have all the best tech minds but some things are just off limits to them being politically correct And no, not Epstein. It's a general statement; but it's disappointing that they're like this (and of course Gemini was famously the one that gave black Nazis and things like that) | | |
| ▲ | underlipton 5 hours ago | parent [-] | | Google has never fixed their black people/gorilla issue. The foundational tech that all of their products run on going back a decade is fundamentally flawed (and outputs outputs that many would say align with racist ideologies, among others). |
|
|
|
|