▲ | Calculating GPT-2's Inference Speedups(njkumar.com) | |||||||
2 points by njkumarr 14 hours ago | 2 comments | ||||||||
▲ | p1esk 13 hours ago | parent [-] | |||||||
Good post, thank you! On an A100 80GB we get 312 teraflops per second of float16 compute and 1.5 TB/s of memory bandwidth, and this ratio comes out to roughly 208 tokens. Few thoughts: 1. One token != one byte 2. Your prompt ("Edgar Allan Poe is a”) is short (<<300 tokens) 3. Both flops and memory bandwidth for A100 are theoretical maximums. Reality is usually very different and is workload dependent. | ||||||||
|