Exactly. Not sure why it would be better to run Rust on Nvidia GPUs compared to actual CUDA code.

I get the idea of added abstraction, but do think it becomes a bit jack-of-all-tradesey.

▲ rbanffy 6 days ago | parent | next [-]

I think the idea is to allow developers to write a single implementation and have a portable binary that can run on any kind of hardware.

We do that all the time - there are lots of code that chooses optimal code paths depending on runtime environment or which ISA extensions are available.

▲ pjmlp 6 days ago | parent | next [-]

Without the tooling though.

Commendable effort, however just like people forget languages are ecosystems, they tend to forget APIs are ecosystems as well.

▲ kookamamie 6 days ago | parent | prev [-]

Sure. The performance-purist in me would be very doubtful about the result's optimality, though.

▲ littlestymaar 6 days ago | parent [-]

The performance purist don't use Cuda either though (that's why Deepseek used PTX directly).

Everything is an abstraction and choosing the right level of abstraction for your usecase is a tradeoff between your engineering capacities and your performance needs.

▲ LowLevelMahn 6 days ago | parent | next [-]

this Rust demo also uses PTX directly

  During the build, build.rs uses rustc_codegen_nvvm to compile the GPU kernel to PTX.
  The resulting PTX is embedded into the CPU binary as static data.
  The host code is compiled normally.

▲

LegNeato 6 days ago | parent | next [-]

To be more technically correct, we compile to NVVM IR and then use NVIDIA's NVVM to convert it to PTX.

▲

saagarjha 5 days ago | parent | prev [-]

That’s not really the same thing; it compiles through PTX rather than using inline assembly.

	▲	LegNeato 17 hours ago \| parent [-]
		FYI, you can drop down into ptx if need be: https://github.com/Rust-GPU/Rust-CUDA/blob/aa7e61512788cc702...

▲ brandonpelfrey 6 days ago | parent | prev [-]

The issue in my mind is that this doesn’t seem to include any of the critical library functionality specific eg to NVIDIA cards, think reduction operations across threads in a warp and similar. Some of those don’t exist in all hardware architectures. We may get to a point where everything could be written in one language but actually leveraging the hardware correctly still requires a bunch of different implementations, ones for each target architecture.

The fact that different hardware has different features is a good thing.

	▲	rbanffy 6 days ago \| parent [-]
		The features missing hardware support can fall back to software implementations. In any case, ideally, the level of abstraction would be higher, with little application logic requiring GPU architecture awareness.

▲ MuffinFlavored 6 days ago | parent | prev | next [-]

> Exactly. Not sure why it would be better to run Rust on Nvidia GPUs compared to actual CUDA code.

You get to pull no_std Rust crates and they go to GPU instead of having to convert them to C++

▲ JayEquilibria 6 days ago | parent | prev | next [-]

Good stuff. I have been thinking of learning Rust because of people here even though CUDA is what I care about.

My abstractions though are probably best served by Pytorch and Julia so Rust is just a waste of time, FOR ME.

▲ the__alchemist 6 days ago | parent | prev | next [-]

I think the sweet spot is:

If your program is written in rust, use an abstraction like Cudarc to send and receive data from the GPU. Write normal CUDA kernels.

▲ Ar-Curunir 6 days ago | parent | prev [-]

Because folks like to program in Rust, not CUDA

▲

tucnak 6 days ago | parent [-]

"Folks" as-in Rust stans, whom know very little about CUDA and what makes it nice in the first place, sure, but is there demand for Rust ports amongst actual CUDA programmers?

I think not.

▲

LegNeato 6 days ago | parent | next [-]

FYI, rust-cuda outputs nvvm so it can integrate with the existing cuda ecosystem. We aren't suggesting rewriting everything in Rust. Check the repo for crates that allow using existing stuff like cudnn and cuBLAS.

▲

apitman 6 days ago | parent | next [-]

Do you have a link? I went to the rust GPU repo and didn't see anything. I have an academic pipeline that currently heavily tied to CUDA because we need nvCOMP. Eventually we hope us or someone else makes an open source gzip library for GPUs. Until then it would at least be nice if we could implement our other pipeline stages in something more open.

	▲	LegNeato 5 days ago \| parent [-]
		https://github.com/Rust-GPU/Rust-CUDA/tree/main/crates/cudnn and the parent dir has more.

▲

tucnak 6 days ago | parent | prev [-]

I take it you're the maintainer. Firstly, congrats on the work done, for the open source people are a small crowd, and determination of Rust teams here is commendable. On the other hand, I'm struggling to see the unique value proposition. What is your motivation with Rust-GPU? Graphics or general-purpose computing? If it's the latter, at least from my POV, I would struggle to justify going up against a daunting umbrella project like this; in view of it likely culminating in layers upon layers of abstraction. Is the long-term goal here to have fun writing a bit of Rust, or upsetting the entrenched status quo of highly-concurrent GPU programming? There's a saying that goes along like "pleasing all is a lot like pleasing none," and intuitively I would guess it should apply here.

▲

LegNeato 6 days ago | parent [-]

Thanks! I'm personally focused on compute, while other contributors are focused on graphics.

I believe GPUs are the future of computing. I think the tooling, languages, and ecosystems of GPUs are very bad compared to CPUs. Partially because they are newer, partially because they are different, and partially because for some reason the expectations are so low. So I intend to upset the status quo.

▲

tucnak 6 days ago | parent [-]

Have you considered post-GPU accelerators? For large-scale machine learning, TPU's have won, basically. There are new vendors like Tenstorrent offering completely new (and much simpler) computing hardware. GPU's may as well live on borrowed time as far as compute is concerned.

▲

LegNeato 5 days ago | parent [-]

Yes, see the proposed WG link I posted above. When I say GPU I'm using it as shorthand...indeed, I think the "graphics" part is on borrowed time and will just become fully software. It is already happening.

	▲	tucnak 5 days ago \| parent [-]
		Tenstorrent is often criticised for having lots of abstraction layers, compilers, IR's in the middle—it's all in C++, of course. GPU's are okay, but none of them got network-on-chip capability. Some promising papers have been coming out, like SystolicAttention, etc. There's just so much stuff for GPU's, but not that much for sysolic NoC systems (TPU, TT, NPU's) I think Rust could really make an impact here. Abandon all GPU deadweight, stick to simple abstractions, assume 3d twisted torus for topology and that's it. Food for thought!

▲

Ar-Curunir 6 days ago | parent | prev | next [-]

Rust expanded systems programming to a much larger audience. If it can do the same for GPU programming , even if the resulting programs are not (initially) as fast as CUDA programs, that’s a big win.

▲

tayo42 6 days ago | parent | prev [-]

What makes cuda nice in the first place?

▲

tucnak 6 days ago | parent [-]

All the things marked with red cross in the Rust-CUDA compatibility matrix.

https://github.com/Rust-GPU/Rust-CUDA/blob/main/guide/src/fe...

▲

dvtkrlbs 6 days ago | parent [-]

I mean that will improve only with time though, No? Maintainers recently revived the rust-gpu and rust-cuda backends. I don't think even the maintainer would say this is ready for prime time. Another benefit is having able to run the same code (library aka crate) on the CPU and GPU. This would require really good optimization to be done on the GPU backend to have the full benefits but I definitely see the value proposition and the potential.

▲

apitman 6 days ago | parent [-]

Those red Xs represent libraries that only work on Nvidia GPUs and would represent a massive amount of work to re-implement in a cross-platform way, and you may never achieve the same performance either because of abstraction or because you can't match the engineering resources Nvidia has thrown at this the past decade. This is their moat.

I think open source alternatives will come in time, but it's a lot.

▲

dvtkrlbs 5 days ago | parent | next [-]

I don't see any of the Xs that would not possible to generate code for and expose compiler intrisincs for. You don't reinvent the wheel here you generate the NVVM bytecode and let Nvidia handle the rest.

	▲	dvtkrlbs 5 days ago \| parent [-]
		Wait my browser scrolled to the wrong place. For libraries it is even easier to create or write bindings and like the status says several is already in progress.

▲

pjmlp 6 days ago | parent | prev [-]

Plus IDE integration, GPU graphical debugging with the same quality as Visual Studio debugging, polyglot ecosystem.