Great article. Personally I have been learning more about the mathematics of beyond-CLT scenarios (fat tails, infinite variance etc)

The great philosophical question is why CLT applies so universally. The article explains it well as a consequence of the averaging process.

Alternatively, I’ve read that natural processes tend to exhibit Gaussian behaviour because there is a tendency towards equilibrium: forces, homeostasis, central potentials and so on and this equilibrium drives the measurable into the central region.

For processes such as prices in financial markets, with complicated feedback loops and reflexivity (in the Soros sense) the probability mass tends to ends up in the non central region, where the CLT does not apply.

▲

parpfish 8 hours ago | parent | next [-]

As to ye philosophy of “why” the CLT gives you normals, my hunch is that it’s because there’s some connection between:

a) the CLT requires samples drawn from a distribution with finite mean and variance

and b) the Gaussian is the maximum entropy distribution for a particular mean and variance

I’d be curious about what happens if you starting making assumptions about higher order moments in the distro

▲

orangemaen 6 hours ago | parent | next [-]

The standard framing defines the Gaussian as this special object with a nice PDF, then presents the CLT as a surprising property it happens to have. But convolution of densities is the fundamental operation. If you keep convolving any finite-variance distribution with itself, the shape converges, and we called the limit "normal." The Gaussian is a fixed point of iterated convolution under √n rescaling. It earned its name by being the thing you inevitably get, not by having elegant closed-form properties.

The most interesting assumptions to relax are the independence assumptions. They're way more permissive than the textbook version suggests. You need dependence to decay fast enough, and mixing conditions (α-mixing, strong mixing) give you exactly that: correlations that die off let the CLT go through essentially unchanged. Where it genuinely breaks is long-range dependence -fractionally integrated processes, Hurst parameter above 0.5, where autocorrelations decay hyperbolically instead of exponentially. There the √n normalization is wrong, you get different scaling exponents, and sometimes non-Gaussian limits.

There are also interesting higher order terms. The √n is specifically the rate that zeroes out the higher-order cumulants. Skewness (third cumulant) decays at 1/√n, excess kurtosis at 1/n, and so on up. Edgeworth expansions formalize this as an asymptotic series in powers of 1/√n with cumulant-dependent coefficients. So the Gaussian is the leading term of that expansion, and Edgeworth tells you the rate and structure of convergence to it.

▲

ramblingrain 6 hours ago | parent | prev | next [-]

It is the not knowing, the unknown unknowns and known unknowns which result in the max entropy distribution's appearance. When we know more, it is not Gaussian. That is known.

▲

mitthrowaway2 5 hours ago | parent [-]

Exactly this. From this perspective, the CLT then can be restated as: "it's interesting that when you add up a sufficiently large number of independent random variables, then even if you have a lot of specific detailed knowledge about each of those variables, in the end all you know about their sum is its mean and variation. But at least you do reliably know that much."

	▲	D-Machine 4 hours ago \| parent [-]
		Came here basically looking to see this explanation. Normal dist is [approximately] common when summing lots of things we don't understand, otherwise, it isn't really.

▲

2 hours ago | parent | prev | next [-]

[deleted]

▲

sobellian 7 hours ago | parent | prev | next [-]

IIRC there's a video by 3b1b that talks about that, and it is important that gaussians are closed under convolution.

	▲	gowld 6 hours ago \| parent [-]
		That makes it an equilibrium point in function space, but the other half is why it's an a global attractor.

▲

derbOac 7 hours ago | parent | prev [-]

IIRC the third moment defines a maxent distribution under certain conditions and with a fourth moment it becomes undefined? It's been awhile though.

If I'm remembering it correctly it's interesting to think about the ramifications of that for the moments.

▲

benmaraschino 8 hours ago | parent | prev [-]

You (and others) may enjoy going down the rabbit hole of universality. Terence Tao has a nice survey article on this which might be a good place to start: https://direct.mit.edu/daed/article/141/3/23/27037/E-pluribu...