| ▲ | stymaar 6 hours ago | |
> That 3090 is going to burn 750W The 3090's TPD is 350W, but given that LLM's token generation isn't compute bound, people usually undervolt these cards to reduce power consumption. IIRC you can get as low as 200-250W without any degradation. Caveat these figures are without speculative decoding and at batch size =1. | ||
| ▲ | 4chandaily 5 hours ago | parent [-] | |
This is correct. I have (4) 3090s in my inference server, and they are each capped at 250w. I run Qwen 3.5 122B-A10 at about 45-50tok/s on this and am quite happy with it. At idle it draws around 95-105w for all four, which is a bit high, but tolerable. | ||