Remix.run Logo
physicsguy 5 days ago

It really isn't that hard to pivot. It's worth saying that if you were already writing OpenMP and MPI code then learning CUDA wasn't particularly difficult to get started, and learning to write more performant CUDA code would also help you write faster CPU bound code. It's an evolution of existing models of compute, not a revolution.

Q6T46nT668w6i3m 5 days ago | parent [-]

I agree that “learning CUDA wasn’t particularly difficult to get started,” there are Grand Canyon sized chasms between CUDA and its alternatives when attempting to crank performance.

physicsguy 4 days ago | parent | next [-]

Well, I think to a degree that depends what you're targeting.

Single socket 8 core CPU? Yes.

If you spent some time playing with trying to eke out performance on Xeon Phi and have done NUMA-aware code for multi socket boards and optimising for the memory hierarchy of L1/L2/L3 then it really isn't that different.

j45 5 days ago | parent | prev [-]

It will improve for sure but this shouldn’t be downplayed.