▲ | krona a day ago | ||||||||||||||||
These workarounds might be good enough to detect ~1-5% changes from a baseline of a native, pre-compiled application. However this won't be sufficient for many dynamic, JIT-compiled languages which will usually still have large amounts of inter-run variance due to timing-sensitive compilation choices of the runtime. A statically significant ~10% change can be hard to detect in these circumstances from a single run. In my experience multi-run benchmarking frameworks which use non-parametric statistics should be the default tool of choice unless you know the particular benchmark is exceptionally well behaved. | |||||||||||||||||
▲ | Sesse__ a day ago | parent | next [-] | ||||||||||||||||
> In my experience multi-run benchmarking frameworks which use non-parametric statistics should be the default tool of choice unless you know the particular benchmark is exceptionally well behaved. Agreed. Do you have any suggestions? :-) | |||||||||||||||||
| |||||||||||||||||
▲ | menaerus a day ago | parent | prev [-] | ||||||||||||||||
> These workarounds might be good enough to detect ~1-5% changes from a baseline of a native, pre-compiled application. and > However this won't be sufficient for many dynamic, JIT-compiled languages which will usually still have large amounts of inter-run variance due to timing-sensitive compilation choices of the runtime. are not mutually exclusive. Any sufficiently complex statically compiled application will suffer from the same variance issues. > A statically significant ~10% change can be hard to detect in these circumstances from a single run. Multiple runs do not solve the problem. For example, if you have your 1st test-run reporting 11%, 2nd test-run 8%, 3rd test-run 18%, 4th test-run 9% and 5th test-run 10% how do you get to decide if 18% from the 3rd test-run is noise or signal? | |||||||||||||||||
|