| ▲ | textlapse an hour ago | |||||||
My understanding of warp (https://docs.nvidia.com/cuda/cuda-programming-guide/01-intro...) is that you are essentially paying the cost of taking both the branches. I understand with newer GPUs, you have clever partitioning / pipelining in such a way block A takes branch A vs block B that takes branch B with sync/barrier essentially relying on some smart 'oracle' to schedule these in a way that still fits in the SIMT model. It still doesn't feel Turing complete to me. Is there an nvidia doc you can refer me to? | ||||||||
| ▲ | rowanG077 an hour ago | parent [-] | |||||||
That applies inside a single warp, notice the wording: > In SIMT, all threads in the warp are executing the same kernel code, but each thread may follow different branches through the code. That is, though all threads of the program execute the same code, threads do not need to follow the same execution path. This doesn't say anything about dependencies of multiple warps. | ||||||||
| ||||||||