Remix.run Logo
imtringued a day ago

Correct. There is too much architectural divergence between GPU vendors. If they really wanted to avoid vendor specific extensions in user level code, they would have gone with something that could be said to be loosely inspired by tiny grad (which isn't ready yet).

Basically, you need a good description of the hardware and the compiler automatically generates the state of the art GEMM kernel.

Maybe it's 20% worse than Nvidia's hand written kernels, but you can switch hardware vendors or build arbitrary fused kernels at will.