▲ | melodyogonna 6 days ago | |||||||
Very interesting. I wonder about the model of storing the GPU IR in binary for a real-world project; it seems like that could bloat the binary size a lot. I also wonder about the performance of just compiling for a target GPU AOT. These GPUs can be very different even if they come from the same vendor. This seems like it would compile to the lowest common denominator for each vendor, leaving performance on the table. For example, Nvidia H-100s and Nvidia Blackwell GPUs are different beasts, with specialised intrinsics that are not shared, and to generate a PTX that would work on both would require not using specialised features in one or both of these GPUs. Mojo solves these problems by JIT compiling GPU kernels at the point where they're launched. | ||||||||
▲ | LegNeato 6 days ago | parent [-] | |||||||
The underlying projects support loading at runtime, so you could have as many AOT compiled kernel variants and load the one you want. Disk is cheap. You could even ship rustc if you really wanted for "JIT" (lol), but maybe not super crazy as mojo is llvm based anyway. There is of course the warmup / stuttering problem, but that is a separate issue and (sometimes) not an issue for compute vs graphics where it is a bigger issue. I have some thoughts on how to improve the status quo with things unique to Rust but too early to know. One of the issues with GPUs as a platform is runtime probing of capabilities is... rudimentary to say the least. Rust has to deal with similar stuff with CPUs+SIMD FWIW. AOT vs JIT is not a new problem domain and there are no silver bullets only tradeoffs. Mojo hasn't solved anything in particular, their position in the solution space (JIT) has the same tradeoffs as anyone else doing JIT. | ||||||||
|