Posted by ComputerGuru 1 day ago
Claude Opus came up with this script:
It produces a somewhat-readable PDF (first page at least) with this text output:
(I used the cleaned output at https://pastebin.com/UXRAJdKJ mentioned in a comment by Joe on the blog page)
Or worse. She did.
More likely it's just an oversight, but it could also be CYA for dragging their feet, like "you rushed us, and look at these victims you've retraumatized". There are software solutions to find nudity and they're quite effective.
The challenge, as we're all experiencing together, is that the law is not inherently self-enforcing.
https://www.govinfo.gov/content/pkg/PLAW-119publ38/pdf/PLAW-... : the Attorney General was to have produced the entirety of the Epstein files, with very narrowly-enumerated redactions, in December. She has not done so.
Furthermore, there are numerous allegations that the documents that have been released contain CSAM, which (referencing the PDF above) may fall afoul of 18 U.S.C. 2252–2252A.
In addition, one need only glance at the action in US courts to see egregious violations of the Constitution and valid court orders playing out daily.
https://www.documentcloud.org/documents/26513988-trorder0128...
https://storage.courtlistener.com/recap/gov.uscourts.mnd.230...
The legal situation regarding CSAM is very strict no matter which country, and I better hope no one here will actually be dumb enough to provide actual links.
1. Get an open source pdf decoder
2. Decode bytes up to first ambiguous char
3. See if next bits are valid with an 1, if not it’s an l
4. Might need to backtrack if both 1 and l were valid
By being able to quickly try each char in the middle of the decoding process you cut out the start time. This makes it feasible to test all permutations automatically and linearly
The copy linked in the post:
https://www.justice.gov/epstein/files/DataSet%209/EFTA004004...
Three more copies:
https://www.justice.gov/epstein/files/DataSet%2010/EFTA02153...
https://www.justice.gov/epstein/files/DataSet%2010/EFTA02154...
https://www.justice.gov/epstein/files/DataSet%2010/EFTA02154...
Perhaps having several different versions might make it easier.
I consider myself fairly normal in this regard, but I don't have 76 friends to ask to do this, so I don't know how I'd go about doing this. Post an ad on craigslist? Fiverr? Seems like a lot to manage.
Hmm. Anyone got some spare CPU time?
Followup: pdfimages is 13x faster than pdftoppm