It's all a blur - Hacker News

Posted by zdw 5 days ago

338 points | 62 commentspage 2

esafak 10 hours ago|

This is classical deconvolution. Modern de-blurring implementations are DNN-based.

praptak 13 hours ago||

My (admittedly superficial) knowledge about blur reversibility is that an attacker may know what kind of stuff is behind the blur.

I mean knowledge like "a human face, but the potential set of humans is known to the attacker" or even worse "a text, but the font is obvious from the unblurred part of the doc".

jonathanlydall 13 hours ago||

This was also my understanding.

It's essentially like "cracking" a password when you have its hash and know the hashing algorithm. You don't have to know how to reverse the blur, you just need to know how to do it the normal way, you can then essentially brute force through all possible characters one at a time to see if it looks the same after applying the blur.

Thinking about this, adding randomness to the blurring would likely help.

Or far more simply, just mask the sensitive data with a single color which is impossible to reverse (for rasterized images, this is not a good idea for PDFs which tend to maintain the text "hidden" underneath).

swiftcoder 11 hours ago|||

> mask the sensitive data with a single color which is impossible to reverse

You note the pitfall of text remaining behind the redaction in PDFs (and other layered formats), but there are also pitfalls here around alpha channels. There have been several incidents where folks drew not-quite-opaque redaction blocks over their images.

yetihehe 11 hours ago|||

> just mask the sensitive data with a single color which is impossible to reverse (for rasterized images, this is not a good idea for PDFs

Also not a good idea for masking already compressed images of text, like jpg, because some of the information might bleed out in uncovered areas.

johnmaguire 10 hours ago||

Interesting - does a little extra coverage solve this or is it possible to use distant pixels to find the original?

wheybags 8 hours ago|||

I'm not super familiar with the jpeg format, but iirc h.264 uses 16x16 blocks, so if jpeg is the same then padding of 16px on all sides would presumably block all possible information leakage?

Except the size of the blocked section ofc. E.g If you know it's a person's name, from a fixed list of people, well "Huckleberry" and "Tom" are very different lengths.

sebastianmestre 8 hours ago|||

yep, some padding fixes this

JPEG compression can only move information at most 16px away, because it works on 8x8 pixel blocks, on a 2x down-sampled version of the chroma channels of the image (at least the most common form of it does)

oulipo2 13 hours ago||

The parade is easy: just add a small amount of random noise (even not visible to the human eye) to the blurred picture, and suddenly the "blur inversion" fails spectacularly

sebzim4500 12 hours ago||

Does this actually work? I would have thought that, given the deconvolution step is just a linear operator with reasonable coefficients, adding a small amount of noise to the blurred image would just add similarly small amount of noise to the unblurred result.

srean 12 hours ago||

To reconstruct the image one has to cut off those frequencies in the corrupted image where the signal to noise is poor. In many original images, the signal in high frequencies are sacrificable, so get rid of those and then invert.

https://en.wikipedia.org/wiki/Wiener_deconvolution

If one blindly inverts the linear blur transform then yes, the reconstruction would usually be a complete unrecognisable mess because the inverse operator is going to dramatically boost the noise as well.

jfaganel99 10 hours ago||

How do we apply this to geospatial face and licence plate blurs?

IshKebab 8 hours ago||

In practice unblurring (deconvolution) doesn't really work as well as you'd hope because it is usually blind (you don't know the blur function), and it is ill-conditioned, so any small mistakes or noise get enormously amplified.

jkuli 6 hours ago||

A simple solution is to use a system of linear equations. Each row of a matrix is a linear equation, Ax = b Each row contains kernel weightings A across the image X, B is the blurred pixel color. The full matrix would be a terabyte, so take advantage of the zeros and use an efficient solve for X instead of inversion.

Enhance really refers to combining multiple images. (stacking) Each pixel in a low res image was a kernel over the same high res image. So undoing a 100 pixel blur is equivalent to combining 10,000 images for 100x super resolution.

zb3 9 hours ago||

Ok, what about gaussian blur?

unconed 10 hours ago||

Sorry but this post is the blind leading the blind, pun intended. Allow me to explain, I have a DSP degree.

The reason the filters used in the post are easily reversible is because none of them are binomial (i.e. the discrete equivalent of a gaussian blur). A binomial blur uses the coefficients of a row of Pascal's triangle, and thus is what you get when you repeatedly average each pixel with its neighbor (in 1D).

When you do, the information at the Nyquist frequency is removed entirely, because a signal of the form "-1, +1, -1, +1, ..." ends up blurred _exactly_ into "0, 0, 0, 0...".

All the other blur filters, in particular the moving average, are just poorly conceived. They filter out the middle frequencies the most, not the highest ones. It's equivalent to doing a bandpass filter and then subtracting that from the original image.

Here's an interactive notebook that explains this in the context of time series. One important point is that the "look" that people associate with "scientific data series" is actually an artifact of moving averages. If a proper filter is used, the blurryness of the signal is evident. https://observablehq.com/d/a51954c61a72e1ef

jerf 8 hours ago||

"In today’s article, we’ll build a rudimentary blur algorithm and then pick it apart."

Emphasis mine. Quote from the beginning of the article.

This isn't meant to be a textbook about blurring algorithms. It was supposed to be a demonstration of how what may seem destroyed to a causal viewer is recoverable by a simple process, intended to give the viewer some intuition that maybe blurring isn't such a good information destroyer after all.

Your post kind of comes off like criticizing someone for showing how easy it is to crack a Caesar cipher for not using AES-256. But the whole point was to be accessible, and to introduce the idea that just because it looks unreadable doesn't mean it's not very easy to recover. No, it's not a mistake to be using the Caesar cipher for the initial introduction. Or a dead-simple one-dimensional blurring algorithm.

the_fall 6 hours ago|||

If you have an endless pattern of ..., -1, 1, -1, 1, -1, 1, ... and run box blur with a window of 2 or 4, you get ..., 0, 0, 0, 0, 0, 0, ... too.

Other than that, you're not wrong about theoretical Gaussian filters with infinite windows over infinite data, but this has little to do with the scenario in the article. That's about the information that leaks when you have a finite window with a discrete step and start at a well-defined boundary.

yunnpp 8 hours ago|||

Interesting...I've used moving averages not thinking too hard about the underlying implications. Do you recommend any particular book or resource on DSP basics for the average programmer?

jszymborski 9 hours ago||

> Sorry but this post is the blind leading the blind, pun intended. Allow me to explain, I have a DSP degree.

FWIW, this does not read as constructive.

Sesse__ 4 hours ago||

It also makes no sense to me, and I also have a DSP degree. Of course moving averages (aka box blurs) filter out higher frequencies more than middle frequencies.

oulipo2 13 hours ago||

Those unblurring methods look "amazing" like that but they are just very fragile, add even a modicum of noise to the blurred image and the deblurring will almost certainly completely fail, this is well-known in signal-processing

srean 12 hours ago||

Not necessarily.

If, however, one just blindly uses the (generalized)inverse of the point-spread function, then you are absolutely correct for the common point-spread functions that we encounter in practice (usually very poorly conditioned).

One way to deal with this is to cut off those frequencies where the signal to noise in that frequency bin is poor. This however requires some knowledge about the spectrum of the noise and signal. Weiner filter uses that knowledge to work out an optimal filter.

https://en.wikipedia.org/wiki/Wiener_deconvolution

If one doesn't know about the statistics of the noise, not about the point-spread function, then it gets harder and you are in the territory of blind deconvolution.

So just a word of warning, if you a relying only on sprinkling a little noise in blurred images to save yourself, you are on very, very dangerous ground.

matsemann 9 hours ago||

Did you see the part where he saved with more and more lossy compression and showed that it still was recoverable?

chenmx 10 hours ago|

What I find fascinating about blur is how computational photography has completely changed the game. Smartphone cameras now capture multiple exposures and computationally combine them, essentially solving the deblurring problem before it even happens. The irony is that we now have to add blur back artificially for portrait mode bokeh, which means we went from fighting blur to synthesizing it as a feature.