Very interesting. I wonder about the model of storing the GPU IR in binary for a real-world project; it seems like that could bloat the binary size a lot.

I also wonder about the performance of just compiling for a target GPU AOT. These GPUs can be very different even if they come from the same vendor. This seems like it would compile to the lowest common denominator for each vendor, leaving performance on the table. For example, Nvidia H-100s and Nvidia Blackwell GPUs are different beasts, with specialised intrinsics that are not shared, and to generate a PTX that would work on both would require not using specialised features in one or both of these GPUs.

Mojo solves these problems by JIT compiling GPU kernels at the point where they're launched.

▲

LegNeato 6 days ago | parent [-]

The underlying projects support loading at runtime, so you could have as many AOT compiled kernel variants and load the one you want. Disk is cheap. You could even ship rustc if you really wanted for "JIT" (lol), but maybe not super crazy as mojo is llvm based anyway. There is of course the warmup / stuttering problem, but that is a separate issue and (sometimes) not an issue for compute vs graphics where it is a bigger issue. I have some thoughts on how to improve the status quo with things unique to Rust but too early to know.

One of the issues with GPUs as a platform is runtime probing of capabilities is... rudimentary to say the least. Rust has to deal with similar stuff with CPUs+SIMD FWIW. AOT vs JIT is not a new problem domain and there are no silver bullets only tradeoffs. Mojo hasn't solved anything in particular, their position in the solution space (JIT) has the same tradeoffs as anyone else doing JIT.

	▲	melodyogonna 6 days ago \| parent [-]
		> The underlying projects support loading at runtime, so you could have as many AOT compiled kernel variants and load the one you want. I'm not sure I understand. What underlying projects? The only reference to loading at runtime I see on the post is loading the AOT-compiled IR. > There is of course the warmup / stuttering problem, but that is a separate issue and (sometimes) not an issue for compute vs graphics where it is a bigger issue. It should be worth noting that Mojo itself is not JIT compiled; Mojo has a GPU infrastructure that can JIT compile Mojo code at runtime [1]. > One of the issues with GPUs as a platform is runtime probing of capabilities is... rudimentary to say the least. Also not an issue in Mojo when you combine the selective JIT compilation I mentioned with powerful compile-time programming [2]. 1. https://docs.modular.com/mojo/manual/gpu/intro-tutorial#4-co... - here the kernel "print_threads" will be jit compiled 2. https://docs.modular.com/mojo/manual/parameters/