I think the sweet spot is:
If your program is written in rust, use an abstraction like Cudarc to send and receive data from the GPU. Write normal CUDA kernels.