The plethora of packages, including DSLs for compute and MLIR.
https://developer.nvidia.com/how-to-cuda-python
https://cupy.dev/
And
"Zero to Hero: Programming Nvidia Hopper Tensor Core with MLIR's NVGPU Dialect" from 2024 EuroLLVM.
https://www.youtube.com/watch?v=V3Q9IjsgXvA