Remix.run Logo
rowanG077 2 hours ago

That applies inside a single warp, notice the wording:

> In SIMT, all threads in the warp are executing the same kernel code, but each thread may follow different branches through the code. That is, though all threads of the program execute the same code, threads do not need to follow the same execution path.

This doesn't say anything about dependencies of multiple warps.

textlapse 2 hours ago | parent [-]

It's definitely possible, I am not arguing against that.

I am just saying it's not as flexible/cost-free as you would on a 'normal' von Neumann-style CPU.

I would love to see Rust-based code that obviates the need to write CUDA kernels (including compiling to different architectures). It feels icky to use/introduce things like async/await in the context of a GPU programming model which is very different from a traditional Rust programming model.

You still have to worry about different architectures and the streaming nature at the end of the day.

I am very interested in this topic, so I am curious to learn how the latest GPUs help manage this divergence problem.