| ▲ | uberduper 9 hours ago | |
There's a few dimensions you can look at for gpu load. Probably the easiest indirect metric to watch for gpu load is power usage. But if you really care about this, you should actually profile your application. nsight systems makes this pretty simple to do. Dunno how many actually care about having a TUI. | ||
| ▲ | ManyaGhobadi 8 hours ago | parent [-] | |
Power is useful as a second-order metric and can help catch drastic underutilization, but it has similar problems to SM Active (DCGM) -- it tends to overestimate utilization and doesn't distinguish between useful compute and memory traffic. It's very possible to be in a memory-bound workload with high power even though underutilizing compute utilization. Our goal was to separate these bottlenecks out so there's more visibility into where to optimize. On nsys, agreed it's great, but we wanted something that could run continuously instead of an offline analysis tool. We think there's room for both to be useful. | ||