Posted by lukeigel 12/20/2025
Please AMA!
On the other hand, it's way more information than I expected. I can see why someone would hesitate to release them - there's a lot to sift through and it's likely even the government couldn't sift through all of them to make sure their friends weren't mentioned somewhere.
The House Oversight Committee's giant drop in November had tons of data we still didn't take advantage of even after doing the original Jmail, like flight logs.
For the Yahoo release, which is still ongoing, the folks at Drop Site News (see https://www.jmail.world/about) are handling the manual redaction which has been very time consuming, even with tons of AI to help in the background.
For now we’re focusing on fixing the bugs because we’re already seeing an insane wave of traffic so most of us are focused on keeping the site alive.
We did an initial parsing pass of all four DOJ document batches on Friday. This takes a raw PDF and returns chunks containing typed blocks—each with a type (Title, Text, Figure, etc.), bounding boxes, content, and confidence scores. For PDFs that were just scans of photographs (which was like 90% of new content in Friday's release), it gave in depth descriptions of those! You can type search terms like "door" at https://www.jmail.world/photos to see what I mean.
For apps like Jmail and JFlights we use their structured extraction endpoint instead—you define a schema (e.g. {from, to, subject, date, body} for emails or {departure_airport, arrival_airport, passengers[], date} for flights) and it pulls those fields directly into JSON.
The JFlights example served as the best ad for Reducto and how doc parsing technology can speed up hours of journalistic investigations like this.
See for yourself. Given this document
https://www.jmail.world/drive/HOUSE_OVERSIGHT_002031
It inferred and enriched multiple flight cards on JFlights (https://www.jmail.world/flights). I was really shook when I first saw this.
Images removed from Epstein files less than a day after being posted - https://www.abc.net.au/news/2025-12-21/images-removed-from-e...
promises all the sleuthing excitement of chasing the significance of Donald in a Drawer.
https://www.imdb.com/news/ni65628031/
https://bsky.app/profile/meidastouch.com/post/3mag7myutmc2d
however it seems that this photo is actually taken from a 2003 Democratic fundraiser, and the redacted images of victims were of Diana Ross' son Evan, and Michael Jackson's kids, Paris and Prince Jackson. This may or may not be accurate either, since I have not been able to dig down into the photo and determine if it has any connections to a supposed 2003 fundraiser.
But it seems more likely to be true than not that this was sloppily planted evidence that was especially insultingly fake.
on edit: looking closer does not seem to be exact same photos, but instead two different photos taken at the same time and place, so in the 2003 Dem fundraising, but a different photo of that. So it could be that Epstein had it and DOJ thought hey, look at these pervs! Let's release!!
They were supposed to redact all minors, not just "victims".
Similarly situation with Trump, for that matter.
All the evidence I've seen is person A knew or spoke to Epstein. Where is the evidence linking any person to a specific sexual encounter?
And no, not Epstein. It's a general statement; but it's disappointing that they're like this (and of course Gemini was famously the one that gave black Nazis and things like that)
Of course, she'll have hanged herself shortly afterward while the security cameras were malfunctioning.
An alternative would be to strip out all obvious known words and only leave unknowns (i.e., names) and then have those fragments reviewed (in a reCAPTCHA sorta way).
Finally, for images, cover all faces and the one by one decide which should remain covered and which should not.
LOTS of work but there are workflows to mitigate the ability for reviewers to connect more than they should.
At the risk of stating the obvious, the functionality isn't actually cloned, only the UI. The actual code powering Gmail probably dates back to the late 80s or early 90s and has had several hundred thousands of hours of work put into it. This is just a webpage that looks kind of similar.
I point this out only because I've seen people saying that software businesses don't have moats anymore because of this, which is taking away a completely false lesson.
I mean it has happened in other Google products...
They bought both Deja and Neotonic.
Mails could even be in the trillions.
The UI cloning doesn't feel exactly correct either there are things that are slightly off.
But I just find the "cloned" wrong, because obviously you cannot send an email from this account, you cannot log in to the service as Jeffrey Epstein, you cannot delete emails, create alerts based on searches, do actions on selected emails (create new tag, move under that tag)
there are so many functionalities that are not cloned because obviously they could not be cloned because they would make no sense for what this project is. So just the praise for cloning so quickly makes me sort of mad.
You could theoretically make something like this that allowed log in so you got a personalized epstein mails, and then could do all that, and perhaps get more mails sent in as files get released, and perhaps create Google alerts on epstein in the news etc. that would come as mails and maybe the code could put news that came in, into the appropriate the tags etc.
But until that time "cloned" is just very wrong.
no. google did not exist until the late 90s.
various forms of internet email sure did, but most popular mtas of the google era shared very little code with predecessors from the 80s and early 90s (maybe sendmail) and google almost certainly wrote their own from scratch.
but your first point. that an archive browser that looks like gmail is not equivalent to a full tilt email service backend is valid.
- Fetching email messages
- Parsing email headers
- Mime parsing
- Converting the text of email bodies into UTF-8
- Threading messages
- Eliding reply text
Given that the official story is that pb made the first version of Gmail in a day, does anyone actually believe that he wrote the code for any of those things in a day? If you honestly believe that I have a bridge to sell you.
Wait till you learn that the source code in Chrome also predates the existence of Google.
And this is exactly why I stopped participating in discussions on reddit and never on LinkedIn. Discussions on HN are so much civil and respectful here
P.S. if the top level comment was indeed posted by a "less technically inclined" person, I hope this is a humbling, positive educational experience, at least that's how I would take it
Email as a technology is ancient by today’s standards. SMTP protocol got established in 1982. Even sendmail dates as far back as the ‘70s.
Another great article
Can you expand on this please? Really cool btw.
The cynic in me would assume that someone with a lot of money wants to hide some of the emails and the best way to do that (at this point) is to release them filtered with a great UI.
The total archive size is 300GB. AFAIK they have only released around 2GB. Curious what is in the rest of it assuming it does not get [redacted] out or deleted. I am also curious how they intend to release the rest of it in time to meet the requirements of the act. Discussion [1] Epstein Files bill sponsor Ro Khanna and Hassan, no dogs being zapped.
[1] - https://www.youtube.com/watch?v=KT2u0Fp3hQg [video][1hr12m]
Probably a lot of CSAM, if the Mossad blackmail op theory of Epstein is true.
How could you tell?
A job for an LLM…
but they just copy the "UI" not the whole product
Jared kushner, is that you?
You're welcome, of course, to make your substantive points thoughtfully.
I would bet the Gmail team has single employee salaries in that range.
if only there were some kind of universal summary engine that never gets tired and is essentially free.
Also, interesting that this one got by him (unopened, unread, filtered from inbox) and the timing of it being near his final arrest (coincidence, but still) https://www.jmail.world/eml/0b80588f551f3d097695f1c9507b6572
He sure bought a lot of books. I found this that is not a book.
Indeed. Though the high school uniform thing seems to be a fairly mainstream fetish, with hormones raging at that time in people's lives. But granted, if it's not for your partner or (of age) mistress...
You'll get the whole RAG/context layer going in an hour. Can self-host too if you prefer.
This is really impressive btw!
2007 -> 2017:
https://www.goodreads.com/list/show/237057.Epstein_s_Library...
2018 -> 2019:
https://www.goodreads.com/list/show/237060.Epstein_s_Library...
You can only add 100 books in a Goodreads list so I had to create two. In the lists I linked back to the original Bloomberg article.
Jmail is the only place to see those Amazon order emails by the way! Those are from his Yahoo, which Bloomberg announced in September but Drop Site News actually let us release this month. It all came from https://ddosecrets.com/article/epstein-emails (redactions of the full dataset still taking place)
I would have thought an quasi-billionaire like Epstein would have a personal concierge do the purchases for him. I certainly would if I was that rich. I would certainly not buy a shower head, I would show my concierge a picture of the shower I want and have him appoint a plumber so that I can have my shower the way I want it when I come back the next day. What's the point of being rich if that's to buy shower heads on Amazon?
He also seemed bad at delegating his interior decorating to a professional, judging by the photos of his island.
It looks pretty anonymous to me - a completely normal shoddy comms rack from the early-mid 2000s. The only real distinguishing feature is the fibre splice/breakout work taped to a bit of plywood at a funny angle, but even that's not so very out of the ordinary.
Perhaps you're confusing it with any of a hundred thousand pictures of similar setups from that era?
If you cannot be arsed to perform an image search that’s on you. You don’t speak for the rest of HN readers. Have a good one!
I was just setting you up for realistic expectations, seeing as how no one has commented under either of your replies, except to criticize them, so I expect this 'legion of people' will never appear.
Good luck!
Source?
We found that Volume 2 and Volume 4 had the most never-before-seen stuff.
https://www.justice.gov/epstein/doj-disclosures/data-set-2-f... https://www.justice.gov/epstein/doj-disclosures/data-set-4-f...
Also, this morning they quietly released volumes 5-7. Will have to find out how much of this is new.
[0] https://en.wikipedia.org/wiki/Steve_Bannon#Media_and_investi...
[1] https://www.yahoo.com/entertainment/movies/articles/steve-ba...
Did you imagine rich and successful, high status people hang out with random wage workers they find in a bar somewhere in the midwest?
Hopefully I can merge some UI for it soon, but I'm away from my main computer right now.
EDIT: I'm going to text my friend Luke to comment and answer any questions about the project.
EDIT2: I am also happy to expand on the technical details of the project once I get a stable internet connection.
I'm one of the co-creators of Jmail alongside Riley Walz. We launched a Gmail-like view of Epstein's inbox last month. It got millions of page views, tons of really amazing requests to collaborate on making more related data accessible, and even new Yahoo emails that no one else has allowed the public to see.
Yesterday's DOJ drop resulted in this very spontaneous rag-tag team of friends coming to my place in SF and each making their own app in the "Jmail" Suite. Riley and I are pretty shocked by how versatile this parody style is for visualizing Epstein's 20 year digital footprint.
It's been a ton of fun and we're working hard to polish each view here.
Specifically at https://www.jmail.world/photos
You probably meant to use "in light of"
1. DOJ (The White House's docs that they were required by law to drop yesterday plus many court documents, videos, and other docs from many news cycles this year)
2. HOUSE_OVERSIGHT (the House Oversight Committee's releases. giant November drop that led to the original Jmail, then some photo drops this month)
3. Yahoo emails (originally sourced by DDoSecrets, then provided to us, redacted and verified by Drop Site News)
There is so much material in HOUSE_OVERSIGHT that never appears in DOJ, and vice versa. And then the Yahoo drop reveals even more new material. It feels like three odd slices of a giant dataset that keeps getting released.
re: people's complaints about yesterday's release having way too many redactions, I have no idea how much they over-redacted. I hear that they will release even more quite soon though.
Do you have a page about each dataset you're sourcing and the background on them like your provide here?
The "EFTA00000468" saga has me distrusting the authenticity of most of these datasets.
Re: the DDoSecrets emails though (YAHOO dataset), I have more to share.
Drop Site News agreed to give us access to the Yahoo dataset discovered by DDoSecrets, but on the condition that we help redact it. It's a completely unfiltered dataset. It's literally just .eml files for jeeprojects@yahoo.com. It includes many attached documents. There is no illegal imagery, but it has photos of Epstein's extended family (nephews, nieces, etc) and headshots of many models that Epstein's executive assistant would send to him. I was quite shocked that this thing existed.
We built some internal redaction tools that the Drop Site team is now using to comb through all of this. We've released 5 batches of the Yahoo mail now, with the 1k+ Amazon receipts being the most recent.
A few thoughts on how we do redaction are here: https://www.jmail.world/about.
Unlike the DOJ, we've tried to minimize the ambiguity about what was redacted.
For example: all redacted images are replaced with a Gemini-generated description of that photograph.
Another example: we are aggressively redacting email addresses and phone numbers of normal people to avoid spamming them. Perhaps others would leave it all in, but Riley and I don't want to be responsible for these people's lives getting disrupted by this entire saga. For example, we redacted this guy's email but not his name: https://www.jmail.world/thread/4accfb5f3ed84656e9762740081a4...
Riley and I were not expecting this type of scope when we first dropped Jmail. Jmail is an interesting side project for us, and this new dataset requires full-time attention. Thankfully we have help though. We're happy to take on this responsibility given how helpful, thoughtful and careful both the Drop Site and DDoSecrets team has been here.
Has anyone written a parser for the text messages? A messages-like UI to be able to read through all the texts would be super interesting too. The format DOJ released them in is impossible to follow.
https://michelcrypt4d4mus.github.io/epstein_text_messages/
He also shouted us out last month which was very kind of him
It doesn't belong into the Epstein Files, and doesn't need to be censored either, but the way it is framed in the DoJ release implies guilt where there is none.
That being said, I think we can demand a level of due diligence from public institutions that entails only censoring actual victims on actual pieces of evidence, instead of mindlessly placing black squares on the faces of news article pictures found on his computer. Nevermind that nobody can explain yet how this particular picture ended up in the grand jury files anyway.
who planted them?
Here's at least one notable image.
The original site was on Railway and written in Pug! It crashed after Riley's tweet first went viral, then Riley did the heroic work of caching it all with Cloudflare after waking up to the site being down. After millions of unique visitors we racked up about $10 in costs.
This time we switched to Next.js 16 + Vercel, used Cloudflare R2 for asset hosting, and used Neon as the db. R2 has free egress, and Vercel + Next is cheap if cached correctly.
A special someone at Vercel gave us some tips on caching this one earlier today. We started by just using unstable_cache all over the place, and now we're migrating to ISR + full static pre-generation of as many pages as we can via generateStaticParams.
I know that Luke was working on stuff so as not to hit the database as much, but I was in the middle of a flight as that was happening so he'll have to come and provide more details.