| ▲ | bigyabai 12 hours ago | ||||||||||||||||
A lot of the TDP is reserved for running the shader units at full-power. My RTX 3070 Ti only pulls ~110w of it's 320w running CUDA inference on Gemma 26b and E4B. | |||||||||||||||||
| ▲ | Scaevolus 12 hours ago | parent | next [-] | ||||||||||||||||
It's not that it's reserving power, but rather that you hit some bottleneck on a 3070 Ti before running into thermal limits-- it's likely limited by either tensor core saturation or RAM throughput. Running the workload with Nvidia's profiling tools should make the bottleneck obvious. | |||||||||||||||||
| |||||||||||||||||
| ▲ | ycui7 2 hours ago | parent | prev | next [-] | ||||||||||||||||
B70 idles at 30W, while RTX PRO 4500 idles at 9W (measured to be 5W at wall). B70 runs at 1/3 token output rate of RTX PRO 4500 and consume 3X idle power when do nothing. | |||||||||||||||||
| ▲ | culopatin 4 hours ago | parent | prev | next [-] | ||||||||||||||||
My 4070 super and 5070 super both max out their tdp when I use them with ollama, is your usage different? | |||||||||||||||||
| ▲ | gambiting 10 hours ago | parent | prev [-] | ||||||||||||||||
My 5090 runs at full TDP(pretty much exactly 575W) when running inference through LM Studio. | |||||||||||||||||
| |||||||||||||||||