| ▲ | augment_me 2 hours ago | |
My comment is aimed to highlight that the "GPU Bubble" is frames as a general solution when it's not, its a specific bottleneck based on your model size. Your dont mention your model size anywhere, the reader has to infer it from the runtimes, and if they dont know the average forward pass of a model, well too bad, they will leave without understanding the actual trade-off. The benchmarks you point to in the section titled "A cost model for the bubble" dont include any CPU overheads or the T_block-T_pipe you mention, they just give the improvement %. In general, you answers here in the thread read as defensive and unhumble. They leave a sour taste of your company, you should consider how you engage with your audience. | ||