▲ | mturmon 4 days ago | |
The article you link is not using the CLT correctly. The CLT gives a result about a recentered and rescaled version of the sum of iid variates. CLT does not give a result about the sum itself, and the article is invoking such a result in the “files” and “lakes” examples. I’m aware that it can appear that CLT does say something about the sum itself. The normal distribution of the recentered/rescaled sum can be translated into a distribution pertaining to the sum itself, due to the closure of Normals under linear transformation. But the limiting arguments don’t work any more. What I mean by that statement: in the CLT, the errors of the distributional approximation go to zero as N gets large. For the sum, of course the error will not go to zero - the sum itself is diverging as N grows, and so is its distribution. (The point of centering and rescaling is to establish a non-diverging limit distribution.) So for instance, the third central moment of the Gaussian is zero. But the third central moment of a sum of N iid exponentials will diverge quickly with N (it’s a gamma with shape parameter N). This third-moment divergence will happen for any base distribution with non-zero skew. The above points out another fact about the CLT: it does not say anything about the tails of the limit distribution. Just about the core. So CLT does not help with large deviations or very low-probability events. This is another reason the post is mistaken, which you can see in the “files” example where it talks about the upper tail of the sum. The CLT does not apply there. | ||
▲ | mturmon 4 days ago | parent [-] | |
Postscript: looking at the lesswrong link referenced by the post above, you will notice that the “eyeball metric” density plots happen to be recentered and scaled so that they capture the mass of the density. This is the graphical counterpart of the algebraic scaling and centering needed in the CLT. |