| ▲ | btbuildem 4 days ago | |
I'm not sure, I did not run any benchmarks. As a ballpark figure -- with both cards throttled down to 250W, running a Qwen-30B FP8 model (variant depending on task), I get upwards of 60 tok/sec. It feels on par with the premium models, tbh. Of course this is in a single-user environment, with vLLM keeping the model warm. | ||