Top
Best
New

Posted by ComputerGuru 1 day ago

Recreating Epstein PDFs from raw encoded attachments(neosmart.net)
208 points | 45 commentspage 2
blindriver 3 hours ago|
On one hand, the DOJ gets shit because it was taking too long to produce the documents, and then on another, they get shit because there are mistakes in the redacting because there are 3 million pages of documents.
rapind 1 hour ago||
What they are redacting is pretty questionable though. Entire pages being suspiciously redacted with no explanation (which they are supposed to provide). This is just my opinion, but I think it's pretty hard to defend them as making an honest and best effort here. Remember they all lied about and changed their story on the Epstein "files" several times now (by all I mean Bondi, Patel, Bongino, and Trump).

It's really really hard to give them the benefit of the doubt at this point.

thereisnospork 3 hours ago||
Considering the justice to document ratio that's kind of on them regardless.
nubg 3 hours ago||
Wait would this give us the unredacted PDFs?
ryanSrich 28 minutes ago||
That's the idea yeah. There are other people actively working on this. You can follow vx-underground on twitter. They're tracking it.
poyu 2 hours ago||
I think it's the PDF files that were attached to the emails, since they're base64 encoded.
FarmerPotato 4 hours ago||
If only Base64 had used a checksum.
zahlman 4 hours ago|
"had used"? Base64 is still in very common use, specifically embedded within JSON and in "data URLs" on the Web.
bahmboo 3 hours ago||
"had" in the sense of when it was designed and introduced as a standard
legitster 3 hours ago||
Given how much of a hot mess PDFs are in general, it seems like it would behoove the government to just develop a new, actually safe format to standardize around for government releases and make it open source.

Unlike every other PDF format that has been attempted, the federal government doesn't have to worry about adoption.

Spooky23 2 hours ago||
You’re thinking about this as a nerd.

It’s not a tools problem, it’s a problem of malicious compliance and contempt for the law.

derwiki 3 hours ago|||
JPEG?
legitster 3 hours ago|||
That's not really comparable - It needs to be editable and searchable.
recursive 1 hour ago|||
Lossy
linuxguy2 5 hours ago||
Love this, absolutely looking forward to some results.
eek2121 4 hours ago||
Honestly, this is something that should've been kept private, until each and every single one of the files is out in the open. Sure, mistakes are being made, but if you blast them onto the internet, they WILL eventually get fixed.

Cool article, however.

iwontberude 4 hours ago||
This one is irresistible to play with. Indeed a nerd snipe.
netsharc 4 hours ago|
I doubt the PDF would be very interesting. There are enough clues in the human-readable parts: it's an invite to a benefit event in New York (filename calls it DBC12) that's scheduled on December 10, 2012, 8pm... Good old-fashioned searching could probably uncover what DBC12 was, although maybe not, it probably wasn't a public event.

The recipient is also named in there...

RajT88 4 hours ago||
There's potentially a lot of files attached and printed out in this fashion.

The search on the DOJ website (which we shouldn't trust), given the query: "Content-Type: application/pdf; name=", yields maybe a half dozen or so similarly printed BASE64 attachments.

There's probably lots of images as well attached in the same way (probably mostly junk). I deleted all my archived copies recently once I learned about how not-quite-redacted they were. I will leave that exercise to someone else.

zahlman 4 hours ago||
> …but good luck getting that to work once you get to the flate-compressed sections of the PDF.

A dynamic programming type approach might still be helpful. One version or other of the character might produce invalid flate data while the other is valid, or might give an implausible result.

yunnpp 1 hour ago||
Time to flex those Leetcode skills.
prettywoman 4 hours ago|
[dead]