▲ | janwas 2 days ago | |
CPU-time would over-emphasize regions where many threads are running, right? I find wall-time useful for finding serial regions that aren't yet parallelized. More detail here: https://github.com/dvyukov/perf-load. We recently implemented the same idea without requiring context-switch events: https://github.com/google/highway/blob/master/hwy/profiler.h... |