Benchmarking requires a bit of different setup than the rest of the testing, especially if you want down to the ms timings.

We have continous benchmarking of one of our tools, it's written in C++, and to get "same" results everytime we launch it on the same machine. This is far from ideal, but otherwise there be either noisy neighbours, pesky host (if it's vm), etc. etc.

One idea that we thought was what if we can run the same test on the same machine several times, and check older/newer code (or ideally through switches), and this could work for some codepaths, but not for really continous checkins.

Just wondering what folks do. I can assume what, but there is always something hidden, not well known.

▲

spockz a day ago | parent | next [-]

I agree for measuring latency differences you want similar setups. However, by running two versions of the app concurrently on the same machine they both get impacted more or less the same by noisy neighbours. Moreover, by inspecting the flamegraph you can, manually, see these large shifts of time allocation quickly. For automatic comparison you can of course use the raw data.

In addition you can look at total cpu seconds used, memory allocation on kernel level, and specifically for the jvm at the GC metrics and allocation rate. If these numbers change significantly then you know you need to have a look.

We do run this benchmark comparison in most nightly builds and find regressions this way.

	▲	malkia a day ago \| parent [-]
		Good points there - Thanks @spockz!

▲

esafak 4 hours ago | parent | prev [-]

https://en.wikipedia.org/wiki/Hardware_performance_counter can help with noisy neighbors. I am still getting into this.