| ▲ | zozbot234 3 hours ago | |||||||
I'm not quite seeing the real benefit of this. Is the idea that warps will now be able to do work-stealing and continuation-stealing when running heterogenous parallel workloads? But that requires keeping the async function's state in GPU-wide shared memory, which is generally a scarce resource. | ||||||||
| ▲ | nxobject 44 minutes ago | parent | next [-] | |||||||
God, as someone who took their elective on graphics program when GPGPU and computer shaders first became a thing, reading this makes me realize I definitely need an update on what modern GPU uarchs are like now. Re: heterogenous workload: I'm told by a friend in HPC that the old advice about avoiding diverging branches within warps is no longer much of an issue – is that true? | ||||||||
| ||||||||
| ▲ | pjmlp an hour ago | parent | prev | next [-] | |||||||
This is already happening in C++, NVidia is the one pushing the senders/receivers proposal, which is one of the possible co-routine runtimes to be added into C++ standard library. | ||||||||
| ▲ | LegNeato 3 hours ago | parent | prev | next [-] | |||||||
Yes, that's the idea. GPU-wide memory is not quite as scarce on datacenter cards or systems with unified memory. One could also have local executors with local futures that are `!Send` and place in a faster address space. | ||||||||
| ▲ | jmalicki an hour ago | parent | prev [-] | |||||||
A ton of GPU workloads require leaving large amounts of RAM resident on the GPU and running computation with some new data from the CPU. | ||||||||