Remix.run Logo
ashvardanian 6 hours ago

Great article — and several other high-quality deep dives linked at the end! Here's another one on the H100 that I found particularly useful: <https://cudaforfun.substack.com/p/outperforming-cublas-on-h1...>

I agree with the author that programming GEMM on newer GPUs is a very different experience, though I'm wondering if "newer GPUs are [actually strictly] better"? It seems like there should still be some highly cost-effective use cases for T4 GPUs — aren't there?