Remix.run Logo
WCSTombs 4 hours ago

It's not a bad article, but I have to point something out:

> Laplace distilled this structure into a simple formula, the one that would later be known as the central limit theorem. No matter how irregular a random process is, even if it’s impossible to model, the average of many outcomes has the distribution that it describes. “It’s really powerful, because it means we don’t need to actually care what is the distribution of the things that got averaged,” Witten said. “All that matters is that the average itself is going to follow a normal distribution.”

This is not really true, because the central limit theorem requires a huge assumption: that the random process has finite variance. I believe that distributions that don't satisfy that assumption, which we can call heavy-tailed distributions, are much more common in the real world than this discussion suggests. Pointing out that infinities don't exist in the real world is also missing the point, since a distribution that just has a huge but finite variance will require a correspondingly huge number of samples to start behaving like a normal distribution.

Apart from the universality, the normal distribution has a pretty big advantage over others in practice, which is that it leads to mathematical models that are tractable in practice. To go into a slightly more detail, in mathematical modeling, often you define some mathematical model that approximates a real-world phenomenon, but which has some unknown parameters, and you want to determine those parameters in order to complete the model. To do that, you take measurements of the real phenomenon, and you find values for the parameters that best fit the measurements. Crucially, the measurements don't need to be exact, but the distribution of the measurement errors is important. If you assume the errors are independent and normally distributed, then you get a relatively nice optimization problem compared to most other things. This is, in my opinion, about as much responsible for the ubiquity of normal distributions in mathematical modeling as the universality from the central limit theorem.

However, as most people who solve such problems realize, sometimes we have to contend with these things called "outliers," which by another name are really samples from a heavy-tailed distribution. If you don't account for them somehow, then Bad Things(TM) are likely to happen. So either we try to detect and exclude them, or we replace the normal distribution with something that matches the real data a bit better.

Anyway, to connect this all back to the central limit theorem, it's probably fair to say measurement errors tend to be the combined result of many tiny unrelated effects, but the existence of outliers is pretty strong evidence that some of those effects are heavy-tailed and thus we can't rely on the central limit theorem giving us a normal distribution.

abetusk 2 hours ago | parent | next [-]

The fact the article said that is a gross error. You've identified the issue head on.

The sum of independent identically distributed random variables, if they converge at all, converge to a Levy stable distribution (aka fat-tailed, heavy tailed, power law). In this sense, Levy stable distributions are more "normal" than the normal distribution. They also show up with regular frequency all over nature.

As you point out, infinite variance might be dismissed but, in practice, this just ends up getting larger and larger "outliers" as one keeps drawing from the distribution. Infinities are, in effect, a "verb" and so an infinite variance, in this context, just means the distributions spits out larger and larger numbers the more you sample from it.

D-Machine 4 hours ago | parent | prev [-]

This is also right I believe, normal distributions are not ubiquitous really, just they are approximately ubiquitous (and only really if "ignoring rare outliers", and if you also close your eyes to all the things we don't actually understand at all).

The point on convergence rates re: the central limit theorem is also a major point otherwise clever people tend to miss, and which comes up in a lot of modeling contexts. Many things which make sense "in the limit" likely make no sense in real world practical contexts, because the divergence from the infinite limit in real-world sizes is often huge.

EDIT: Also from a modeling standpoint, say e.g. Bayesian, I often care about finding out something like the "range" of possible results for (1) a near-uniform prior, (2), a couple skewed distributions, with the tail in either direction (e.g. some beta distributions), and (3) a symmetric heavy-tailed distribution (e.g. Cauchy). If you have these, anything assuming normality is usually going to be "within" the range of these assumptions, and so is generally not anything I would care about.

Basically, in practical contexts, you care about tails, so assuming they don't meaningfully exist is a non-starter. Looking at non-robust stats of any kind today, without also checking some robust models or stats, just strikes me as crazy.