Async/Await on the GPU

zozbot234 an hour ago | parent | next [-]

I'm not quite seeing the real benefit of this. Is the idea that warps will now be able to do work-stealing and continuation-stealing when running heterogenous parallel workloads? But that requires keeping the async function's state in GPU-wide shared memory, which is generally a scarce resource.

	▲	jmalicki a few seconds ago \| parent \| next [-]
		A ton of GPU workloads require leaving large amounts of RAM resident on the GPU and running computation with some new data from the CPU.
	▲	LegNeato an hour ago \| parent \| prev [-]
		Yes, that's the idea. GPU-wide memory is not quite as scarce on datacenter cards or systems with unified memory. One could also have local executors with local futures that are `!Send` and place in a faster address space.

▲

the__alchemist 3 minutes ago | parent | prev | next [-]

Et tu, GPU?

I am, bluntly, sick of Async taking over rust ecosystems. Embedded and web/HTTP have already fallen. I'm optimistic this won't take hold in GPU; well see. Async splits the ecosystem. I see it as the biggest threat to Rust staying a useful tool.

I do a lot of rust on the GPU for all of the following: 3d graphics, cuFFT via FFI, custom kernels via Cudarc, and ML via Burn and Candle. Thankfully these are all Async-free.

▲

Arch485 35 minutes ago | parent | prev | next [-]

Very cool!

Is the goal with this project (generally, not specifically async) to have an equivalent to e.g. CUDA, but in Rust? Or is there another intended use-case that I'm missing?

▲

shayonj 2 hours ago | parent | prev | next [-]

Very cool to see this and something I have been curious about myself and exploring the space as well. I'd be curious what are some parallels and differentiations between this and NVIDIA's stdexec (outside of it being in Rust and using Future, which is also cooL)

▲

textlapse an hour ago | parent | prev | next [-]

What's the performance like? What would the benefits be of converting a streaming multiprocessor programming model to this?

▲

LegNeato an hour ago | parent [-]

We aren't focused on performance yet (it is often workload and executor dependent, and as the post says we currently do some inefficient polling) but Rust futures compile down to state machines so they are a zero-cost abstraction.

The anticipated benefits are similar to the benefits of async/await on CPU: better ergonomics for the developer writing concurrent code, better utilization of shared/limited resources, fewer concurrency bugs.

	▲	textlapse 19 minutes ago \| parent [-]
		warp is expensive - essentially it's running a 'don't run code' to maintain SIMT. GPUs are still not practically-Turing-complete in the sense that there are strict restrictions on loops/goto/IO/waiting (there are a bunch of band-aids to make it pretend it's not a functional programming model). So I am not sure retrofitting a Ferrari to cosplay an Amazon delivery van is useful other than for tech showcase? Good tech showcase though :)

▲

firefly2000 an hour ago | parent | prev [-]

Is this Nvidia-only or does it work on other architectures?

▲

LegNeato an hour ago | parent [-]

Currently NVIDIA-only, we're cooking up some Vulkan stuff in rust-gpu though.

	▲	monster_truck 41 minutes ago \| parent \| next [-]
		I don't have anything to offer but my encouragement, but there are _dozens_ of ROCm enjoyers out there. In years prior I wouldn't have even bothered, but it's 2026 and AMD's drivers actually come with a recent version of torch that 'just works' on windows. Anything is possible :)
	▲	firefly2000 14 minutes ago \| parent \| prev [-]
		Does the lack of forward progress guarantees (ITS) on other architectures pose challenges for async/await?