▲ | smilekzs 2 days ago | |||||||||||||||||||||||||
Not OP but I think this could be an instance of leaky abstraction at work. Most of the time you hand-write an accelerator kernel hoping to optimize for runtime performance. If the abstraction/compiler does not fully insulate you from micro-architectural details affecting performance in non-trivial ways (e.g. memory bank conflict as mentioned in the article) then you end up still having per-vendor implementations, or compile-time if-else blocks all over the place. This is less than ideal, but still arguably better than working with separate vendor APIs, or worse, completely separate toolchains. | ||||||||||||||||||||||||||
▲ | whimsicalism 2 days ago | parent [-] | |||||||||||||||||||||||||
Yes, it looks like they have some sort of metaprogramming setup (nicer than C++) for doing this: https://www.modular.com/mojo | ||||||||||||||||||||||||||
|