Remix.run Logo
torginus 4 days ago

It's like writing code directly for the GPU's DSP-like SIMD cores in assembly, instead of taking the CUDA model of targeting a single SIMD thread, from which the compiler figures out how to write assembly for the core itself.