Posted by rbanffy 9/4/2025
How do you even begin to think of such things? Some people are wired differently.
Energy can't be created or destroyed, so it follows a continuity equation: du/dt + dq/dx = 0. Roughly, the only way for energy to change in time is by coming from somewhere in space. There are no magic sources/sinks (a source or sink would be a nonzero term on the right).
Then you have Fourier's law/Newton's law of cooling: heat flows proportional to temperature difference, from high to low: q = -du/dx.
Combining these, you get the heat equation: du/dt = d^2 u/dx^2.
Now if you're very fancy, you can find deeper reasons for this, but otherwise if you're in engineering analysis class, just guess that u(t,x)=T(t)X(x). i.e. it cleanly factors along time/space.
But then T'(t)X(x)=X''(x)T(t), so T'(t)/T(t) = X''(x)/X(x). But the left and right are functions of different independent variables, so they must be constant. So you get X''= λX for some lambda. But then from calc1, X is sin/cos.
Likewise T' = λ T so T is e^-λt from calc 1.
Then since it's a linear differential equation, the most general solution (assuming it splits the way we guessed) is a weighted sum of any allowable T(t)X(x), so you get a sum of exponentially decaying (in time) waves (in space).
Other mathematicians before Fourier had used trigonometric series to study waves, and physicists already understood harmonic superposition on eg a vibrating string. I don't have the source but I believe Gauss even noted that trigonometric series were a solution to the heat equation. Fourier's contribution was discovering that almost any function, including the general solution to the heat equation, could be modelled this way, and he provided machinery that let mathematicians apply the idea to an enormous range of problems.
A common mistake I see in people reading mathematics (or even computer science papers) is to think the proof set out in the paper is the thought process that lead to the interesting insight. It is almost always an ex post facto formalisation.
https://www.youtube.com/watch?v=spUNpyF58BY&list=PL4VT47y1w7...
Rather than talking about sine and cosine waves, they motivate the Fourier transform entirely in terms of polynomials. Imagine you want to multiply two polynomials (p(x) and q(x)). The key is to recognize that there are two ways to represent each polynomial:
1. "Coefficient form," as a set of coefficients [p_0, p_1, p_2, ..., p_d] where p(x) = p_0 + p_1x + p_2x^2 + ... + p_dx^d, OR
2. "Sample form," as a set of sampled points from each polynomial, like [(0, p(0)), (1, p(1)), (2, p(2)), ..., (d, p(d))]
Now, naive multiplication of p(x) and q(x) in coefficient form takes O(d^2) scalar multiplications to get the coefficients of p(x)q(x). But if you have p(x) and q(x) in sample form, it's clear that the sample form of p(x)q(x) is just [(0, p(0)q(0)), (1, p(1)q(1)), ...], which requires only O(d) multiplications!
As long as you have enough sample points relative to the degree, these two representations are equivalent (two points uniquely defines a line, three a quadratic, four a cubic, etc.). The (inverse) Fourier transform is just a function that witnesses this equivalence, i.e., maps from representation (1) to representation (2) (and vice-versa). If the sample points are chosen cleverly (not just 1/2/3/...) it actually becomes possible to compute the Fourier transform in O(d log d) time with a DP-style algorithm (the FFT).
So, long story short, if you want to multiply p(x) and q(x), it's best to first convert them to "sample" form (O(d log d) time using the FFT), then multiply the sample forms pointwise to get the sample form of p(x)q(x) (O(d) time), and then finally convert them back to the "coefficient" form (O(d log d) using the inverse FFT).
Turns out... they are not! You can do the same thing using a different set of functions, like Legendre polynomials, or wavelets.
Yup, any set of orthogonal functions! The special thing about sines is that they form an exceptionally easy-to-understand orthogonal basis, with a bunch of other nice properties to boot.
Which to your point: You're absolutely correct that you can use a bunch of different sets of functions for your decomposition. Linear algebra just says that you might as well use the most convenient one!
For someone reading this with only a calculus background, an example of this is that you get back a sine (times a constant) if you differentiate it twice, i.e. d^2/dt^2 sin(nt) = -n^2 sin(nt). Put technically, sines/cosines are eigenfunctions of the second derivative operator. This turns out to be really convenient for a lot of physical problems (e.g. wave/diffusion equations).
Like you can make any vector in R^3 `<x,y,z>` by adding together a linear combination of ` <1,0,0> `, ` <0,1,0> `, ` <0,0,1> `, turns out you can also do it using `<exp(j2pi0/30), exp(j2pi0/31), exp(j2pi0/32)>`, `<exp(j2pi1/30), exp(j2pi1/31), exp(j2pi1/32)>`, and `<exp(j2pi2/30), exp(j2pi2/31), exp(j2pi2/32)>`.
You can actually do it with a lot of different bases. You just need them to be linearly independent.
For the continuous case, it isn't all that different from how you can use a linear combination of polynomials 1,x,x^2,x^3,... to approximate functions (like Taylor series).
And, with your own drawing: https://gofigure.impara.ai
Essentially it's just projection in infinite-dimensional vector spaces.
Luckily, we live in a physical universe, where such mathematical oddities, like infinite bandwidth signals, cannot exist, so this isn't an actual issue. Any signal that that contains infinite bandwidths only exists because it has sampling artifacts. You would, necessarily, be attempting to reconstruct errors. There are many "tricks" around dealing with such flawed signals. But yes, you can't fully reconstruct impossible signals with FFT.
The real world is somewhere in between. It must involve quantum mechanics (in a way I don't really understand), as maximum bandwidth/minimum wavelength bump up against limits such as the Planck length and virtual particles in a vacuum.
An interesting anecdote from Lanczos[1] claims that Michelson (of interferometer fame) observed Gibbs ringing when he tried to reconstruct a square wave on what amounted to a steampunk Fourier analyzer [2]. He reportedly blamed the hardware for lacking the necessary precision.
1: https://math.univ-lyon1.fr/wikis/rouge/lib/exe/fetch.php?med...
2: https://engineerguy.com/fourier/pdfs/albert-michelsons-harmo...
For example, one viewpoint is that "Gibbs ringing" is always present if the bandwidth is limited, just that in the "non-aliased" case the sampling points have been chosen to coincide with the zero-crossings of the Gibbs ringing.
I find that my brain explodes each time I pick up the Fourier Transform, and it takes a few days of exposure to simultaneously get all the subtle details back into my head.
No amount of precision, no number of coefficients, no degree of lowpass filtering can get around the fact that sin(x)/x never decays all the way to zero. So if you don't have an infinitely-long (or seamlessly repeating) input signal, you must apply something besides a rectangular window to it or you will get Gibbs ringing.
There is always more than one way to look at these phenomena, of course. But I don't think the case can be made that bandlimiting has anything to do with Gibbs.
I slightly object to this. Removing small details = blurring the image, which is actually quite noticeable.
For some reason everyone really wants to assume this is true, so for the longest time people would invent new codecs that were prone to this (in particular wavelet-based ones like JPEG-2000 and Dirac) and then nobody would use them because they were blurry. I think this is because it's easy to give up on actually looking at the results of your work and instead use a statistic like PSNR, which turns out to be easy to cheat.
For vision we are much more sensitive to large scale detail (corresponding to low frequency FFT components) than fine scale detail (corresponding to high frequency components), so given the goal of minimizing reduction in perceived quality this is an obvious place to start - throw away some of that fine detail (highest frequency FFT components), and it may not even be noticeable at all if you are throwing away detail at a higher level of resolution than we are able to perceive.
It turns out that human vision is more sensitive to brightness than color (due to numbers of retinal rods vs cones, etc), so compression can also take advantage of that to minimize perceptual degradation, which is what JPEG does - first convert the image from RGB to YUV color space, where the Y component corresponds to brightness and the U,V components carry the color information, then more heavily compress the color information than brightness by separately applying FFT (actually DCT) to each of the Y,U,V components and throwing away more high frequency (fine detail) color information than brightness.
But, yeah, there is no magic and lossy compression is certainly going to be increasingly noticeable the more heavily you compress.
This isn't true in practice - images are not bandlimited like audio so there aren't really visual elements of images corresponding to low frequency cosine waves. That's why the lowest frequency DCT coefficient in a JPEG image is 16x16 pixels, which is hardly large scale.
But you do quantize all components of the DCT transform, not just the highest ones.
Actually in the default JPEG quantization matrix it's the coefficient to the upper-left of the last one that gets the most quantization: https://en.wikipedia.org/wiki/Quantization_(image_processing...
In terms of understanding how JPEG compression works, and how it relates to human perception, I'd say that in order of importance it's:
1) Throw away fine detail by discarding high frequency components
2) More heavily compress/discard color than brightness detail (using YUV)
3) Quantize the frequency components you are retaining
The reason it works is that fine detail is almost completely correlated across colors, so if you only keep the Y plane at full resolution it still stores it.
You couldn't just throw it out in RGB space because eg text would be unreadable.
It doesn't have the same effect because the original high frequency details are correlated and so they're preserved in the Y channel.
Also a related fact is that the rate of change of a sine wave is itself shifted by pi/2, while the exponential curves need no shifting. I'm not a professional in these matters, but I guess there is a deeper connection between the rate of change and the static world.
That sounds complicated but if you look at the integral the exponential kernel is essentially a continuous “matrix” and the function you are integrating over that kernel is a continuous vector.
This observation on can be a guide to better understand infinite dimensional Hilbert spaces ( inner product space + stuff) and is one of the core observations in quantum mechanics where it’s part of the particle-wave concept as it transforms location space -> momentum space.