| ▲ | nl 3 hours ago | |||||||
> we demonstrated running gpt-oss-120b on two RNGD chips [snip] at 5.8 ms per output token That's 86 token/second/chip By comparison, a H100 will do 2390 token/second/GPU Am I comparing the wrong things somehow? | ||||||||
| ▲ | sanxiyn 2 hours ago | parent | next [-] | |||||||
I think you are comparing latency with throughput. You can't take the inverse of latency to get throughput because concurrency is unknown. But then, RNGD result is probably with concurrency=1. | ||||||||
| ▲ | binary132 3 hours ago | parent | prev [-] | |||||||
I thought they were saying it was more efficient, as in tokens per watt. I didn’t see a direct comparison on that metric but maybe I didn’t look well enough. | ||||||||
| ||||||||