▲ | menaerus a day ago | |||||||
> These workarounds might be good enough to detect ~1-5% changes from a baseline of a native, pre-compiled application. and > However this won't be sufficient for many dynamic, JIT-compiled languages which will usually still have large amounts of inter-run variance due to timing-sensitive compilation choices of the runtime. are not mutually exclusive. Any sufficiently complex statically compiled application will suffer from the same variance issues. > A statically significant ~10% change can be hard to detect in these circumstances from a single run. Multiple runs do not solve the problem. For example, if you have your 1st test-run reporting 11%, 2nd test-run 8%, 3rd test-run 18%, 4th test-run 9% and 5th test-run 10% how do you get to decide if 18% from the 3rd test-run is noise or signal? | ||||||||
▲ | krona 19 hours ago | parent [-] | |||||||
> Multiple runs do not solve the problem. For example, if you have your 1st test-run reporting 11%, 2nd test-run 8%, 3rd test-run 18%, 4th test-run 9% and 5th test-run 10% how do you get to decide if 18% from the 3rd test-run is noise or signal? In your 5 sample example, you can't determine if there are any outliers. You need more samples, each containing a multitude observations. Then using fairly standard nonparametric measures of dispersion and central tendency, a summary statistic should make sense, due to CLT. Outliers are only important if you have to throw away data; good measures of central tendency should be robust to them unless your data is largely noise. > Any sufficiently complex statically compiled application will suffer from the same variance issues. Sure, its a rule of thumb. | ||||||||
|