▲ | camel-cdr 5 days ago | |
I'm not aware of any GPU that implements this. Even the interleaved execution introduced in Volta still can only execute one type of instruction at a time [1]. This feature wasn't meant to accelerate code, but to allow more composable programming models [2]. Going of the diagram, it looks equivilant to rapidly switching between predicates, not executing two different operations at once.
The diagram shows how this executes in the following order:Volta:
pre Volta:
The SIMD equivilant of pre Volta is:
The Volta model is:
[1] https://chipsandcheese.com/i/138977322/shader-execution-reor...[2] https://stackoverflow.com/questions/70987051/independent-thr... | ||
▲ | namibj 5 days ago | parent [-] | |
IIUC volta brought the ability to run a tail call state machine with let's presume identically-expensive states and state count less than threads-per-warp, at an average goodput of more than one thread actually active. Before it would loose all parallelism as it couldn't handle different threads having truly different/separate control flow, emulating dumb-mode via predicated execution/lane-masking. |