Remix.run Logo
namibj 5 days ago

IIUC volta brought the ability to run a tail call state machine with let's presume identically-expensive states and state count less than threads-per-warp, at an average goodput of more than one thread actually active.

Before it would loose all parallelism as it couldn't handle different threads having truly different/separate control flow, emulating dumb-mode via predicated execution/lane-masking.