Wait until you encounter GPU utilization. You could have two codes listing 100% utilization and have well over 100x performance difference from each other. The name of these metrics creates natural assumptions that are just wrong. Luckily it is relatively easy to estimate the FLOP/s throughput for most GPU codes and then simply compare to the theoretical peak performance of the hardware.

▲

spindump8930 6 days ago | parent | next [-]

Don't forget that theoretical peak performance is (probably) half the performance listed on the nvidia datasheet because they used the "with sparsity" numbers! I've seen this bite folks who miss the * on the figure or aren't used to reading those spec sheets.

▲

BrendanLong 6 days ago | parent | prev | next [-]

Yeah, the obvious thing with processors is to do something similar:

(1) Measure MIPS with perf (2) Compare that to max MIPS for your processor

Unfortunately, MIPS is too vague since the amount of work done depends on the instruction, and there's no good way to measure max MIPS for most processors. (╯°□°)╯︵ ┻━┻

▲

saagarjha 6 days ago | parent | prev [-]

If your workload is compute bound, of course. Sometimes you want to look at bandwidth instead.

	▲	pama 6 days ago \| parent [-]
		Of course. Lots of useful metrics exist to help tweak code performance without always needing to go into detailed profiler traces. GPU utilization is a particularly poor metric in helping much, except for making sure the code made it to the GPU somehow :-)