▲ | totalperspectiv 2 days ago | ||||||||||||||||||||||||||||||||||
I don’t follow your logic. Mojo can target multiple gpu vendors. What is the Modular specific lock in? | |||||||||||||||||||||||||||||||||||
▲ | subharmonicon a day ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
The blog post is about using an NVIDIA-specific tensor core API that they have built to get good performance. Modular has been pushing the notion that they are building technology that allows writing HW-vendor neutral solutions so that users can break free of NVIDIA's hold on high performance kernels. From their own writing: > We want a unified, programmable system (one small binary!) that can scale across architectures from multiple vendors—while providing industry-leading performance on the most widely used GPUs (and CPUs). | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
▲ | smilekzs 2 days ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
Not OP but I think this could be an instance of leaky abstraction at work. Most of the time you hand-write an accelerator kernel hoping to optimize for runtime performance. If the abstraction/compiler does not fully insulate you from micro-architectural details affecting performance in non-trivial ways (e.g. memory bank conflict as mentioned in the article) then you end up still having per-vendor implementations, or compile-time if-else blocks all over the place. This is less than ideal, but still arguably better than working with separate vendor APIs, or worse, completely separate toolchains. | |||||||||||||||||||||||||||||||||||
|