Posted by ibobev 2 days ago
The great philosophical question is why CLT applies so universally. The article explains it well as a consequence of the averaging process.
Alternatively, I’ve read that natural processes tend to exhibit Gaussian behaviour because there is a tendency towards equilibrium: forces, homeostasis, central potentials and so on and this equilibrium drives the measurable into the central region.
For processes such as prices in financial markets, with complicated feedback loops and reflexivity (in the Soros sense) the probability mass tends to ends up in the non central region, where the CLT does not apply.
In finance, the effects of random factors tend to multiply. So you get a log-normal curve.
As Taleb points out, though, the underlying assumptions behind log-normal break in large market movements. Because in large movements, things that were uncorrelated, become correlated. Resulting in fat tails, where extreme combinations of events (aka "black swans") become far more likely than naively expected.
a) the CLT requires samples drawn from a distribution with finite mean and variance
and b) the Gaussian is the maximum entropy distribution for a particular mean and variance
I’d be curious about what happens if you starting making assumptions about higher order moments in the distro
The most interesting assumptions to relax are the independence assumptions. They're way more permissive than the textbook version suggests. You need dependence to decay fast enough, and mixing conditions (α-mixing, strong mixing) give you exactly that: correlations that die off let the CLT go through essentially unchanged. Where it genuinely breaks is long-range dependence -fractionally integrated processes, Hurst parameter above 0.5, where autocorrelations decay hyperbolically instead of exponentially. There the √n normalization is wrong, you get different scaling exponents, and sometimes non-Gaussian limits.
There are also interesting higher order terms. The √n is specifically the rate that zeroes out the higher-order cumulants. Skewness (third cumulant) decays at 1/√n, excess kurtosis at 1/n, and so on up. Edgeworth expansions formalize this as an asymptotic series in powers of 1/√n with cumulant-dependent coefficients. So the Gaussian is the leading term of that expansion, and Edgeworth tells you the rate and structure of convergence to it.
(I know it is very easy to do "maths" this way).
If I'm remembering it correctly it's interesting to think about the ramifications of that for the moments.
to me it results of 2 factors - 1. Gaussian is the max entropy for a distribution with a given variance and 2. variance is the model of energy-limited behavior whereis physical processes are always under some energy limits. Basically it is the 2nd law.
BUT for the exceptional world, causes multiply or cascade: earthquake magnitudes, network connectivity, etc. So, you get log-normal or fat-tailed.
https://en.wikipedia.org/wiki/Galton_board
at the (I think) Boston Science Museum when I was a kid. They have some pretty cool videos on Youtube if you're curious.
Edit: see eg John Baez's write-up What is Entropy? about the entropy maximization principle, where gaussians make an entrance.
All summation roads lead to normal curves. (There might be an exception for weird probability distributions that do not have a mean; I was surprised when I learned these exist.)
Life is full of sums. Height? That's a sum of genetics and nutrition, and both of those can be broken down into other sums. How long the treads last on a tire? That's a sum of all the times the tire has been driven, and all of those times driving are just sums of every turn and acceleration.
I'm not a data scientist. I'm just a programmer that works with piles of poorly designed business logic.
How did I do in my interview? (I am looking for a job.)
If I had made the extra condition that the random variables had finite variance, you'd be correct. Without the finite variance condition, the distribution is Levy stable.
Levy stable distributions can have finite mean but infinite variance. They can also have infinite mean and infinite variance. Only in the finite mean and finite variance case does it imply a Gaussian.
Levy stable distributions are also called "fat-tailed", "heavy-tailed" or "power law" distributions. In some sense, Levy stable distributions are more normal than the normal distribution. It might be tempting to dismiss the infinite variance condition but, practically, this just means you get larger and larger numbers as you draw from the distribution.
This was one of Mandelbrot's main positions, that power laws were much more common than previously thought and should be adopted much more readily.
As an aside, if you do ever get asked this in an interview, don't expect to get the job if you answer correctly.
But the counterintuitive thing about the CLT is that it applies to distributions that are not normal.
For simplicity, take N identically distributed random variables that are uniform on the interval from [-1/2,1/2], so the probability distribution function, f(x), on the interval from [-1/2,1/2] is 1.
The Fourier transform of f(x), F(w), is essentially sin(w)/w. Taking only the first few terms of the Taylor expansion, ignoring constants, gives (1-w^2).
Convolution is multiplication in Fourier space, so you get (1-w^2)^n. Squinting, (1-w^2)^n ~ (1-n w^2 / n)^n ~ exp(-n w^2). The Fourier transform of a Gaussian is a Gaussian, so the result holds.
Unfortunately I haven't worked it out myself but I've been told if you fiddle with the exponent of 2 (presumably choosing it to be in the range of (0,2]), this gives the motivation for Levy stable distributions, which is another way to see why fat-tailed/Levy stable distributions are so ubiquitous.
Widths of different uniform distributions along with different centers all still have a quadratic center, so the above argument only needs to be minimally changed.
The added bonus is that if the (1-w^2)^n is replaced by (1-w^a)^n, you can sort of see how to get at the Levy stable distribution (see the characteristic function definition [0]).
The point is that this gives a simple, high-level motivation as to why it's so common. Aside from seeing this flavor of proof in "An Invitation to Modern Number Theory" [1], I haven't really seen it elsewhere (though, to be fair, I'm not a mathematician). I also have never heard the connection of this method to the Levy stable distributions but for someone communicating it to me personally.
I disagree about the audience for Quanta. They tend to be exposed to higher level concepts even if they don't have a lot of in depth experience with them.
[0] https://en.wikipedia.org/wiki/Stable_distribution#Parametriz...
[1] https://www.amazon.com/Invitation-Modern-Number-Theory/dp/06...
Unfortunately, many "researchers" blindly assume that many real life phenomena follow Gaussian, which they don't... So then their models are skewed
The causal chain is: the math is simple -> teachers teach simple things -> students learn what they're taught -> we see the world in terms of concepts we've learned.
The central limit theorem generalizes beyond simple math to hard math: Levy alpha stable distributions when variance is not finite, the Fisher-Tippett-Gnedenko theorem and Gumbel/Fréchet/Weibull distributions regarding extreme values. Those curves are also everwhere, but we don't see them because we weren't taught them because the math is tough.
We can use Calculus to do so much but also so little…
In practice when modeling you are almost always better not assuming normality, and you want to test models that allow the possibility of heavy tails. The CLT is an approximation, and modern robust methods or Bayesian methods that don't assume Gaussian priors are almost always better models. But this of course brings into question the very universality of the CLT (i.e. it is natural in math, but not really in nature).
Statisticians love averages so everywhere that could be sampled as a normal distribution will be presented as one
The median is actually more descriptive and power law is equally as pervasive if not more
* excluding bizarre degenerates like constants or impulse functions
He has several other related videos also.
https://www.youtube.com/@3blue1brown/search?query=convolutio...