Remix.run Logo
pornel 10 hours ago

CPUs are not good at branchy code either. Branch mispredictions cause costly pipeline stalls, so you have to make branches either predictable or use conditional moves. Trivially predictable branches are fast — but so are non-diverging warps on GPUs. Conditional moves and masked SIMD work pretty much exactly like on a GPU.

Even if you have a branchy divide-and-conquer problem ideal for diverging threads, you'll get hit by a relatively high overhead of distributing work across threads, false sharing, and stalls from cache misses.

My hot take is that GPUs will get more features to work better on traditionally-CPU-problems (e.g. AMD Shader Call proposal that helps processing unbalanced tree-structured data), and CPUs will be downgraded to being just a coprocessor for bootstrapping the GPU drivers.