Posted by DamonHD 4 days ago
This isn't actually true. You could do this 20 years ago on a consumer laptop, and you don't need the information you get for free from text moving under a filter either.
What you need is the ability to reproduce the conditions the image was generated and pixelated/blurred under. If the pixel radius only encompasses, say, 4 characters, then you only need to search for those 4 characters first. And then you can proceed to the next few characters represented under the next pixelated block.
You can think of pixelation as a bad hash which is very easy to find a preimage for.
No motion necessary. No AI necessary. No machine learning necessary.
The hard part is recreating the environment though, and AI just means you can skip having that effort and know-how.
If media companies want to actually censor something, nothing does better than a simple black box.
You still need to be aware of the context that you're censoring in. Just adding black boxes over text in a PDF will hide the text on the screen, but might still allow the text to be extracted from the file.
Really, if you need to censor something create a duplicate without the originals. Preferably literally without the originals as the size of the black box is also an information leak.
This was pretty different though. The decensoring algorithm I'm describing is just a linear search. But pixelation is not an invertible transformation.
Mr. Swirl Face just applied a swirl to his face, which is invertible (-ish, with some data lost), and could naively be reversed. (I am pretty sure someone on 4chan did it before the authorities did, but this might just be an Internet Legend).
Yeah but if you read about him it serves as a rallying cry for right wing types since he's an example of the candaian legal systems extreme leniency. This guy should be in prison forever and he's been free since 2017. Look at his record of sentencing. I love being a bleeding heart liberal/progressive and all, but this is too far.
Furthermore, don't look too hard at Isreal and it's policy of being very, very open to pedophiles and similar types.
You'll basically want to look up the area of deconvolution. You can interpret it in linear algebra terms as trying to invert an ill-conditioned matrix, or in signal processing terms as trying to multiply by the inverse of the PSF. In real-world cases the main challenge is doing so without blowing up any error that comes from quantization noise (or other types of noise).
See https://bartwronski.com/2022/05/26/removing-blur-from-images... and https://yuzhikov.com/articles/BlurredImagesRestoration1.htm
But others are reversible because the information is not lost.
The details vary per transformation, and sometimes it depends on the transformation having been an imperfectly implemented one. Other times it's just that data is moved around and reduced by some reversible multiplicative factor. And so on.
More details:
The larger the group of pixels, the more characters you'd have to guess, and so the longer this would take. Each character makes it combinatorially more difficult
To make matters worse, by the pigeonhole principle, you are guaranteed to have collisions (i.e. two different sets of characters which pixelate to the same value). E.g. A space with just 6 possible characters, even if limited to a-zA-Z0-9, that's 62*6 = 56800235584, while you can expect at most 2048 color values for it to map to.
(Side note: That's 2048 colors, not 256, between #000000 and #FFFFFF. This is because your pixelation / mosaic algorithm can have eight steps inclusive between, say, #000000 and #010101. That's #000000, #000001, #000100, #010000, #010001, #010100, #000101, and #010101.
Realistically, in scenarios where you wouldn't have pixel-perfect reproduction, you'd need to generate all the combos and sort by closest to the target color, possibly also weighted by a prior on the content of the text. This is even worse, since you might have too many combinations to store.)
So, at 25 pixel blocks, encompassing many characters, you're going to have to get more creative with this. (Remember, just 6 alphanumeric characters = 56 billion combinations.)
Thinking about this as "finding the preimage of a hash", you might take a page from the password cracking toolset and assume priors on the data. (I.e. Start with blocks of text that are more likely, rather than random strings or starting from 'aaaaaa' and counting up.)
I recall a co-worker doing something related(?) for a kind of fun tech demo some ten years or so ago. If I recall it was shooting video while passing a slightly ajar office door. His code reconstructed the full image of the office from the "traveling slit".
I think about that all the time when I find myself in a public bathroom stall.... :-/
Walk past a closed bathroom stall fast enough and you can essentially do that with your own eyes. Or stand there and quickly shift your head side to side. Just don't do it on one that's occupied, that's not cool.
Only in the US. The rest of the world has doors without a gap at the sides.
Not every bathroom stall in the US has gaps either.
Mine was that, typically, people from outside the US, only ever experience toilet stalls with gaps when they visit the US.
Not every stall has gaps there, but I don't recall ever encountering it here in the EU.
You clearly haven't traveled much so you should refrain from sweeping generalisations.
https://finishlynx.com/photo-finish-trentin-sagan-tour-de-fr...
I've just moved to a house with a train line track out front. I want to see if I can use a normal camera to emulate a line scan camera. I have tried with a few random YouTube videos I found [2].
I think the biggest issue I face is that there simply isn't the frame rate in most camera's to get a nicely detailed line scan effect.
---
[1]: https://en.wikipedia.org/wiki/Line-scan_camera
[2]: https://writing.leafs.quest/programming-fun/line-scanner
Reminds me of slit-scan as well. And of course rolling shutters.
This frontend presents them nicely: https://trains.jo-m.ch
And I'd love an archive somewhere of some of the truly awesome train art I've seen.
This method is commonly used in vision systems employing line scan cameras. They are useful in situations where the objects are moving, e.g. along conveyors.
Global shutter sensors of similar resolution are usually a bit more expensive.
With my old film cameras, at higher shutter speeds, instead of opening the entire frame, it would pass a slit of the front/rear shutter curtain over the film to just expose in a thousandth of a second or less time.
I remember my first visit to a toilet in the plush US office of a finance company and thinking WTF are they doing with their toilet cubicle? I only found out later that it's common there.
Luckily things seem to be gradually changing.
1) Open screenshot in MS-Paint (can you even install MS-Paint anymore? Or is it Paint3D now?)
2) Select Color 1: Black
3) Select Color 2: Black
4) Use rectangular selection tool to select piece of text I want to censor.
5) Click the DEL key. The rectangle should now be solid black.
6) Save the screenshot.
As far as I know, AI hasn't figured out a way to de-censor solid black yet.
There was also the Android (and iOS?) truncation issue where parts of the original image were preserved if the edited image took up less space. [edit: also see replies!]
Knowing some formats have such flaws (and I'm too lazy to learn which), I think the best option I think is to replace step 6 with "screenshot the redacted image", so in effect its a completely new image based on what the redacted image looks like, not on any potential intricacies of the format et al.
https://www.wired.com/story/acropalyse-google-markup-windows...
https://www.lifewire.com/acropalypse-vulnerability-shows-why...
Was a loooong time ago, so I don’t remember the details.
https://www.cnet.com/tech/tech-industry/at-38t-leaks-sensiti...
screenshot - im not convinced apple does not use invisible watermark to add info into image data. but for fact every photo you take with iphone, contains invisible watermark with your "phone serial number". to remove such watermarks, facebook is converting every picture you post for last 10 years... just weird extra con to using modern technology.
try to copy banknote on your printer, it will not print anything, just says error. + every page of text printed contains barely visible yellow marks containing again serial number of printer.
....
Paint3D, the successor to MSPaint, is now discontinued in favor of MSPaint, which doesn't support 3d but it now has Microsoft account sign-in and AI image generation that runs locally on your Snapdragon laptop's NPU but still requires you to be signed in and connected to the internet to generate images. Hope that clears things up
I did though, under certain circumstances. Microsoft's Snipping Tool was vulnerable to the "acropalypse" vulnerability - which mostly affected the cropping functionality, but could plausibly affect images with blacked-out regions too, if the redacted region was a large enough fraction of the overall image.
The issue was that if your edited image had a smaller file size than the original, only the first portion of the file was overwritten, leaving "stale" data in the remainder, which could be used to reconstruct a portion of the unedited image.
To mitigate this in a more paranoid way (aside from just using software that isn't broken) you could re-screenshot your edited version.
As a simple scenario with monospace font rendering, say you know someone is censoring a Windows password that is (at most) 16 characters long. This significantly narrows the search space!
You don't need this step. It already defaults to black, and besides when you do "delete" it doesn't use color 1 at all, only color 2.
But most people don’t care enough.
Or I guess you could make a little video of pixelation that you just paste on top so it looks like you pixelated the thing but in reality there’s no correspondence between the original image and what’s on screen.
Or to be safe, print it and scan it, or just take a screenshot.
Normally the use case is that you still want to distribute it as a PDF, usually consisting of many pages, and without loss of quality, so the printing/scanning/screenshotting option may not be very practical.
No, the real solution is to use an editor that allows you to remove text (and/or cut out bitmaps), before you add black rectangles for clarity.
8) Scan the printed screenshot
> The reconstruction of objects from blurry images has a wide range of applications, for instance in astronomy and biomedical imaging. Assuming that the blur is spatially invariant, image blur can be defined as a two-dimensional convolution between true image and a point spread function. Hence, the corresponding deblurring operation is formulated as an inverse problem called deconvolution. Often, not only the true image is unknown, but also the available information about the point spread function is insufficient resulting in an extremely underdetermined blind deconvolution problem. Considering multiple blurred images of the object to be reconstructed, leading to a multiframe blind deconvolution problem, reduces underdeterminedness. To further decrease the number of unknowns, we transfer the multiframe blind deconvolution problem to a compact version based upon [18] where only one point spread function has to be identified.
https://www.mic.uni-luebeck.de/fileadmin/mic/publications/St...
[1] https://en.wikipedia.org/wiki/Drizzle_(image_processing)
> Moving forward, if I do have sensitive data to hide, I'll place a pure-color mask over the area, instead of a blur or pixelation effect.
Alternately - don't pixelate on a stationary grid when the window moves.
If you want it to look nicer than a color box but without giving away all the extra info when data moves between pixels, pixelate it once and overlay with a static screenshot of that.
For bonus points, you could automate scrambling the pixelation with fake-but-real-looking pixelation. Would be nice if video editing tools had that built in for censoring, knowing that pixelation doesn't work but people will keep thinking it does.
I wonder if it might be good for the blur/censor tools (like on YouTube's editor even) to do an average color match and then add in some random noise to the area that's selected...
Would definitely save people from some hassle.
That moving pixelation look is definitely cooler though. If you wanted to keep it without leaking data you could do the motion tracked screenshot step first (not pixelated, but text all replaced by lorem ipsum or similar) and then run the pixelation over top of that.
If any of you nerds reading this are into video editing, please steal this idea and automate it.
A pixelization filter at least actively removes information from an image, a Gaussian blur or box blurs are straight up invertible by deconvolution and the only reason that doesn't work out of the box is because the blurring is done with low precision (e.g. directly on 8-bit sRGB) or quantized to a low precision format afterwards.
Today we take for granted the ability to conjure a complicated pseudorandom digital stream for keying, but in those days it was just "no can do".
In WWII... SIGSALY was the first system secure by modern standards. Pairs of synchronized one-time phonographic records containing a sequence of tones seeded from a noise source.
Unblurring is an extremely ill-posed problem so any noise or modelling errors get massively amplified.
In only works in this case because there is essentially zero noise, and the correlation between source frames is an exact move.
Yes I would stake hypothetical customer data on this.
“It’s” are the Address lines, which are blurred instead of blacked or whited out, potentially revealing customers private information.
In France, public television raised the alarm a few years ago about the anonymization of voices and the blurring of faces of investigative sources. As a result, an anonymization policy has been implemented requiring that voices be replaced by an actor reading the text and that people be filmed from behind at the very least, or even replaced by an actor altogether.
More problematic are the archives of past investigations, which put people like political dissidents and witnesses against organized crime or the mafia at high risk of suffering retaliation. To mitigate this risk, tens of videos have been taken down, and efforts have been made to contact those who may be affected.
https://larevuedesmedias.ina.fr/urgence-france-televisions-i... (in french)
In this work we find that many current redactions of PDF text are insecure due to non-redacted character positioning information. In particular, subpixel-sized horizontal shifts in redacted and non-redacted characters can be recovered and used to effectively deredact first and last names.
Information-theoretic attacks work on black boxes too