| ▲ | zozbot234 2 hours ago | ||||||||||||||||
I think you're conflating GPU 'threads' and 'warps'. GPU 'threads' are SIMD lanes that are all running with the exact same instructions and control flow (only with different filtering/predication), whereas GPU warps are hardware-level threads that run on a single compute unit. There's no issue with adding extra "don't run code" when using warps, unlike GPU threads. | |||||||||||||||||
| ▲ | textlapse an hour ago | parent [-] | ||||||||||||||||
My understanding of warp (https://docs.nvidia.com/cuda/cuda-programming-guide/01-intro...) is that you are essentially paying the cost of taking both the branches. I understand with newer GPUs, you have clever partitioning / pipelining in such a way block A takes branch A vs block B that takes branch B with sync/barrier essentially relying on some smart 'oracle' to schedule these in a way that still fits in the SIMT model. It still doesn't feel Turing complete to me. Is there an nvidia doc you can refer me to? | |||||||||||||||||
| |||||||||||||||||