▲ | menaerus 18 hours ago | |
> In your 5 sample example, you can't determine if there are any outliers. You need more samples I think the same issue is present no matter how many samples we collect. Statistical apparatus of choice may indeed tell us that given sample is an outlier in our experiment setup but what I am wondering is what if the sample was an actual signal that we measured and not noise. Concrete example: in 10/100 test-runs you see a regression of 10%. The rest of the test-runs show 3%. You can 10x or 100x that example if you wish. Are those 10% regressions the outliers because "the environment was noisy" or did our code really run slower for whatever conditions/reasons in those experiments? > Then using fairly standard nonparametric measures of dispersion and central tendency, a summary statistic should make sense, due to CLT. In theory yes, and for sufficiently large N (samples). Sometimes you cannot afford to reach this "sufficiently large N" condition. |