Certainly impressive that this is possible!

However, for my use cases (running on arbitrary client hardware) I generally distrust any abstractions over the GPU api, as the entire point is to leverage the low level details of the gpu. Treating those details as a nuisance leads to bugs and performance loss, because each target is meaningfully different.

To overcome this, a similar system should be brought forward by the vendors. However, since they failed to settle their arguments, I imagine the platform differences are significant. There are exceptions to this (e.g Angle), but they only arrive at stability by limiting the feature set (and so performance).

Its good that this approach at least allows conditional compilation, that helps for sure.

▲ LegNeato 6 days ago | parent | next [-]

Rust is a system language, so you should have the control you need. We intend to bring GPU details and APIs into the language and core / std lib, and expose GPU and driver stuff to the `cfg()` system.

(Author here)

▲

vouwfietsman 5 days ago | parent | next [-]

> Rust is a system language, so you should have the control you need

I don't think this argument is promising. Its not about the power of the language, but the power of the abstractions you provide over the various GPU APIs.

In fact, I could argue one of the main selling points of Rust (memory safety) has limited applicability in GPU land, because lifetimes are not a thing like they are in CPU land.

I'm sure there's other benefits here, not least the tooling, but certainly the language is not the main selling point...

▲

Voultapher 6 days ago | parent | prev | next [-]

Who is we here? I'm curious to hear more about your ambitions here, since surly pulling in wgpu or something similar seems out-of-scope for the traditionally lean Rust stdlib.

▲

LegNeato 6 days ago | parent [-]

Many of us working on Rust + GPUs in various projects have discussed starting a GPU working group to explore some of these questions:

https://gist.github.com/LegNeato/a1fb3e3a9795af05f22920709d9...

Agreed, I don't think we'd ever pull in things like wgpu, but we might create APIs or traits wgpu could use to improve perf/safety/ergonomics/interoperability.

▲

jpc0 6 days ago | parent | next [-]

Hears an idea,

Get Nvidia, AMD, Intel and whoever else you can get into a room. Get LLVMs boys into the same room.

Compile LLVMIR directly into hardware instructions fed into the GPU, get them to open up.

Having to target an API is part of the problem, get them to allow you to write Rust that directly compiles into the code that will run on the GPU, not something that becomes something else, that becomes spirv that controls a driver that will eventually run on the GPU.

▲

Ygg2 6 days ago | parent | next [-]

Hell will freeze over, then go into negative Kelvin temperatures before you see nVidia agreeing in earnest to do so. They make too much money on NOT GETTING COMMODITIZED. nVidia even changed CUDA to make API not compatible with interpreters.

It's the same reason Safari is in such a sorry state. Why make web browser better, when it could cannibalize your app store?

▲

ashdksnndck 6 days ago | parent | next [-]

Hmm. Maybe the opportunity would be more like AMD, Intel, and the various AI labs and big tech get together, and by their powers combined figure out a way to stop giving NVIDIA their margin?

	▲	__s 5 days ago \| parent [-]
		They tried. OpenCL, OpenMP

▲

jpc0 6 days ago | parent | prev | next [-]

Somehow I want to believe if you get everyone else in the room, and it becomes enough of a market force that nvidia stops selling GPUs because of it, they will change. Cough linux gpu drivers

▲

pjmlp 6 days ago | parent | prev | next [-]

By making Web browser "better" do you mean more ChromeOS like?

CUDA is great for Python as well.

Maybe Intel and AMD should actually produce something worthwhile using.

▲

Ygg2 5 days ago | parent | next [-]

> By making Web browser "better" do you mean more ChromeOS like?

Whichever part makes Safari completely fail at properly rendering Jira. A task even Firefox can do.

	▲	ninkendo 5 days ago \| parent [-]
		> Whichever part makes Safari completely fail at properly rendering Jira What evidence do you have that this is Safari’s fault and not Jira’s fault? Give me a web browser and I will write code that will fail in it and work in other browsers.

▲

pawelmurias 5 days ago | parent | prev [-]

Better for running web apps.

	▲	pjmlp 5 days ago \| parent [-]
		As long as they are using Web standards, and not Chrome APIs, I do agree.

▲

shmerl 6 days ago | parent | prev [-]

Yeah, Nvidia can get lost with their CUDA moat. But AMD should be interested.

▲

bobajeff 6 days ago | parent | prev | next [-]

Sounds sort of like the idea behind MLIR and it's GPU dialects.

* https://mlir.llvm.org/docs/Dialects/NVGPU/

* https://mlir.llvm.org/docs/Dialects/AMDGPU/

* https://mlir.llvm.org/docs/Dialects/XeGPU/

▲

jpc0 6 days ago | parent | next [-]

Very likely something along those lines.

Effectively standardise passing operations off to a coprocessor. C++ is moving into that direction with stdexec and the linear algebra library and SIMD.

I don’t see why Rust wouldn’t also do that.

Effectively why must I write a GPU kernel to have an algorithm execute on the GPU, we’re talking about memory wrangling and linear algebra almost all of the time when dealing with GPU in any way whatsoever. I don’t see why we need a different interface and API layer for that.

OpenGL et al abstract some of the linear algebra away from you which is nice until you need to give a damn about the assumptions they made that are no longer valid. I would rather that code be in a library in the language of your choice that you can inspect and understand than hidden somewhere in a driver behind 3 layers of abstraction.

	▲	bobajeff 6 days ago \| parent \| next [-]
		>I would rather that code be in a library in the language of your choice that you can inspect and understand than hidden somewhere in a driver behind 3 layers of abstraction. I agree that, that would be ideal. Hopefully, that can happen one day with c++, rust and other languages. So far Mojo seems to be the only language close to that vision.
	▲	pjmlp 6 days ago \| parent \| prev [-]
		Guess which companies have been driving senders / receivers work.

▲

trogdc 6 days ago | parent | prev [-]

These are just wrappers around intrinsics that exist in LLVM already.

▲

mertcikla 5 days ago | parent | prev | next [-]

LLVM people have been at it for a while now, they got it working on Nvidia and AMD working on apple I believe: https://www.modular.com/

it baffles me that more people haven't heard about them. it's mighty impressive what they have achieved.

▲

6 days ago | parent | prev [-]

[deleted]

▲

Voultapher 6 days ago | parent | prev | next [-]

Cool, looking forward to that. It's certainly a good fit for the Rust story overall, given the increasingly heterogenous nature of systems.

▲

junon 6 days ago | parent | prev [-]

I'm surprised there isn't already a Rust GPU WG. That'd be incredible.

▲

markman 5 days ago | parent | prev | next [-]

I wish I could say that my lack of understanding the contents of this article was just ignorance but unfortunately it makes my brain want to explode. There is a back ended compliment in there somewhere. What I mean is you're a smart mofo.

▲

shmerl 6 days ago | parent | prev [-]

Do you get any interest from big players like AMD? I'm surprised that they didn't start such initiative, but I guess they can as well back yours.

	▲	LegNeato 18 hours ago \| parent [-]
		I have not, it is easy to get in touch with me though.

▲ ants_everywhere 6 days ago | parent | prev | next [-]

Genuine question since you seem to care about the performance:

As an outsider, where we are with GPUs looks a lot like where we were with CPUs many years ago. And (AFAIK), the solution there was three-part compilers where optimizations happen on a middle layer and the third layer transforms the optimized code to run directly on the hardware. A major upside is that the compilers get smarter over time because the abstractions are more evergreen than the hardware targets.

Is that sort of thing possible for GPUs? Or is there too much diversity in GPUs to make it feasible/economical? Or is that obviously where we're going and we just don't have it working yet?

▲

nicoburns 6 days ago | parent [-]

The status quo in GPU-land seems to be that the compiler lives in the GPU driver and is largely opaque to everyone other than the OS/GPU vendors. Sometimes there is an additional layer of compiler in user land that compilers into the language that the driver-compiler understands.

I think a lot of people would love to move to the CPU model where the actual hardware instructions are documented and relatively stable between different GPUs. But that's impossible to do unless the GPU vendors commit to it.

	▲	pornel 6 days ago \| parent \| next [-]
		I would like CPUs to move to the GPU model, because in the CPU land adoption of wider SIMD instructions (without manual dispatch/multiversioning faff) takes over a decade, while in the GPU land it's a driver update. To be clear, I'm talking about the PTX -> SASS compilation (which is something like LLVM bitcode to x86-64 microcode compilation). The fragmented and messy high-level shader language compilers are a different thing, in the higher abstraction layers.
	▲	sim7c00 6 days ago \| parent \| prev [-]
		i think intel and amd provide ISA docs for their hw. not sure about nvidia didnt check it in forever

▲ diabllicseagull 6 days ago | parent | prev | next [-]

same here. I'm always hesitant to build anything commercial over abstractions, adapter or translation layers that may or may not have sufficient support in the future.

sadly in 2025, we are still in desparate need for an open standard that's supported by all vendors and that allows programming for the full feature set of current gpu hardware. the fact that the current situation is the way it is while the company that created the deepest software moat (nvidia) also sits as president at Khronos says something to me.

▲

pjmlp 6 days ago | parent [-]

Khronos APIs are the C++ of graphics programming, there is a reason why professional game studios never do political wars on APIs.

Decades of exerience building cross platform game engines since the days of raw assembly programming across heterogeneous computer architectures.

What matters are game design and IP, that they eventually can turn into physical assets like toys, movies, collection assets.

Hardware abstraction layers are done once per platform, can even leave an intern do it, at least the initial hello triangle.

As for who seats as president at Khronos, so are elections on committee driven standards bodies.

▲

ducktective 6 days ago | parent [-]

I think you are very experienced in this subject. Can you explain what's wrong with WebGPU? Doesn't it utilize like 80% of the cool features of the modern GPUs? Games and ambitious graphics-hungry applications aside, why aren't we seeing more tech built on top of WebGPU like GUI stacks? Why aren't we seeing browsers and web apps using it?

Do you recommended learning it (considering all the things worth learning nowadays and rise of LLMs)

▲

MindSpunk 6 days ago | parent | next [-]

WebGPU is about a decade behind in feature support compared to what is available in modern GPUs. Things missing include:

- Bindless resources

- RT acceleration

- 64-bit image atomic operations (these are what make nanite's software rasterizer possible)

- mesh shaders

It has compute shaders at least. There's a lot of less flashy to non-experts extensions being added to Vulkan and D3D12 lately that removes abstractions that WebGPU cant have without being a security nightmare. Outside of the rendering algorithms themselves, the vast majority of API surface area in Vulkan/D3D12 is just ceremony around allocating memory for different purposes. New stuff like descriptor buffers in Vulkan are removing that ceremony in a very core area, but its unlikely to ever come to WebGPU.

fwiw some of these features are available outside the browser via 'wgpu' and/or 'dawn', but that doesn't help people in the browser.

▲

johncolanduoni 6 days ago | parent | prev | next [-]

For beginner-to-intermediate raster use cases where you want to direct the GPU yourself, WebGPU will cover most of what you're looking for with reasonably low cognitive overhead. It's at a roughly DX11/Metal level of abstraction, but with some crucial modernizations that reduce friction between WebGPU and the underlying APIs (pipeline layouts being the biggest one).

The main things it's missing for more advanced raster use cases aren't really GPU features as such, but low-level control over the driver. Unlike Vulkan/DX12, it does not permit (or require) you to manage allocations of individual resources in GPU memory, or perform your own CPU/GPU and GPU/GPU synchronization. The rationale (as I recall from some of the people working on the standard talking on GitHub issues or somewhere similar) was that the computational overhead of verifying that you've done these things correctly erases the gains from giving the user this level of control. For the web platform, verification that you're not going to corrupt memory or expose undefined driver behavior is non-negotiable. This explanation makes sense to me, and I haven't really heard anyone put forward a generalizable scheme that allows you Vulkan levels of control with efficient verification.

▲

3836293648 6 days ago | parent | prev [-]

First of all WebGPU has only been supported in Chrome for a few months and Firefox in the next release. And that's just Windows.

We haven't had enough time to develop anything really.

Secondly, the WebGPU standard is like Vulkan 1.0 and is cumbersome to work with. But that part is hearsay, I don't have much experience with it.

▲

johncolanduoni 6 days ago | parent | next [-]

WebGPU has been supported in stable Chrome for over 2 years. I'm also really not sure what about it is much like Vulkan 1.0 (or any other version). No manual allocations, no manual state transitions, almost no manual synchronization. If I had to pick another graphics API to compare it to in terms of abstraction level and exposed primitives, Metal 1 is probably the closest.

▲

sim7c00 6 days ago | parent | prev [-]

gpu is often cumbesome tho. i mean, openGL, Vulkan, they are not really trivial?

	▲	3836293648 6 days ago \| parent [-]
		OpenGL is trivial compared to Vulkan. And apparently Vulkan has gotten much easier today compared to its initial release in 2015

▲ kookamamie 6 days ago | parent | prev | next [-]

Exactly. Not sure why it would be better to run Rust on Nvidia GPUs compared to actual CUDA code.

I get the idea of added abstraction, but do think it becomes a bit jack-of-all-tradesey.

▲ rbanffy 6 days ago | parent | next [-]

I think the idea is to allow developers to write a single implementation and have a portable binary that can run on any kind of hardware.

We do that all the time - there are lots of code that chooses optimal code paths depending on runtime environment or which ISA extensions are available.

▲ pjmlp 6 days ago | parent | next [-]

Without the tooling though.

Commendable effort, however just like people forget languages are ecosystems, they tend to forget APIs are ecosystems as well.

▲ kookamamie 6 days ago | parent | prev [-]

Sure. The performance-purist in me would be very doubtful about the result's optimality, though.

▲ littlestymaar 6 days ago | parent [-]

The performance purist don't use Cuda either though (that's why Deepseek used PTX directly).

Everything is an abstraction and choosing the right level of abstraction for your usecase is a tradeoff between your engineering capacities and your performance needs.

▲ LowLevelMahn 6 days ago | parent | next [-]

this Rust demo also uses PTX directly

  During the build, build.rs uses rustc_codegen_nvvm to compile the GPU kernel to PTX.
  The resulting PTX is embedded into the CPU binary as static data.
  The host code is compiled normally.

▲

LegNeato 6 days ago | parent | next [-]

To be more technically correct, we compile to NVVM IR and then use NVIDIA's NVVM to convert it to PTX.

▲

saagarjha 5 days ago | parent | prev [-]

That’s not really the same thing; it compiles through PTX rather than using inline assembly.

	▲	LegNeato 18 hours ago \| parent [-]
		FYI, you can drop down into ptx if need be: https://github.com/Rust-GPU/Rust-CUDA/blob/aa7e61512788cc702...

▲ brandonpelfrey 6 days ago | parent | prev [-]

The issue in my mind is that this doesn’t seem to include any of the critical library functionality specific eg to NVIDIA cards, think reduction operations across threads in a warp and similar. Some of those don’t exist in all hardware architectures. We may get to a point where everything could be written in one language but actually leveraging the hardware correctly still requires a bunch of different implementations, ones for each target architecture.

The fact that different hardware has different features is a good thing.

	▲	rbanffy 6 days ago \| parent [-]
		The features missing hardware support can fall back to software implementations. In any case, ideally, the level of abstraction would be higher, with little application logic requiring GPU architecture awareness.

▲ MuffinFlavored 6 days ago | parent | prev | next [-]

> Exactly. Not sure why it would be better to run Rust on Nvidia GPUs compared to actual CUDA code.

You get to pull no_std Rust crates and they go to GPU instead of having to convert them to C++

▲ JayEquilibria 6 days ago | parent | prev | next [-]

Good stuff. I have been thinking of learning Rust because of people here even though CUDA is what I care about.

My abstractions though are probably best served by Pytorch and Julia so Rust is just a waste of time, FOR ME.

▲ the__alchemist 6 days ago | parent | prev | next [-]

I think the sweet spot is:

If your program is written in rust, use an abstraction like Cudarc to send and receive data from the GPU. Write normal CUDA kernels.

▲ Ar-Curunir 6 days ago | parent | prev [-]

Because folks like to program in Rust, not CUDA

▲

tucnak 6 days ago | parent [-]

"Folks" as-in Rust stans, whom know very little about CUDA and what makes it nice in the first place, sure, but is there demand for Rust ports amongst actual CUDA programmers?

I think not.

▲

LegNeato 6 days ago | parent | next [-]

FYI, rust-cuda outputs nvvm so it can integrate with the existing cuda ecosystem. We aren't suggesting rewriting everything in Rust. Check the repo for crates that allow using existing stuff like cudnn and cuBLAS.

▲

apitman 6 days ago | parent | next [-]

Do you have a link? I went to the rust GPU repo and didn't see anything. I have an academic pipeline that currently heavily tied to CUDA because we need nvCOMP. Eventually we hope us or someone else makes an open source gzip library for GPUs. Until then it would at least be nice if we could implement our other pipeline stages in something more open.

	▲	LegNeato 5 days ago \| parent [-]
		https://github.com/Rust-GPU/Rust-CUDA/tree/main/crates/cudnn and the parent dir has more.

▲

tucnak 6 days ago | parent | prev [-]

I take it you're the maintainer. Firstly, congrats on the work done, for the open source people are a small crowd, and determination of Rust teams here is commendable. On the other hand, I'm struggling to see the unique value proposition. What is your motivation with Rust-GPU? Graphics or general-purpose computing? If it's the latter, at least from my POV, I would struggle to justify going up against a daunting umbrella project like this; in view of it likely culminating in layers upon layers of abstraction. Is the long-term goal here to have fun writing a bit of Rust, or upsetting the entrenched status quo of highly-concurrent GPU programming? There's a saying that goes along like "pleasing all is a lot like pleasing none," and intuitively I would guess it should apply here.

▲

LegNeato 6 days ago | parent [-]

Thanks! I'm personally focused on compute, while other contributors are focused on graphics.

I believe GPUs are the future of computing. I think the tooling, languages, and ecosystems of GPUs are very bad compared to CPUs. Partially because they are newer, partially because they are different, and partially because for some reason the expectations are so low. So I intend to upset the status quo.

▲

tucnak 6 days ago | parent [-]

Have you considered post-GPU accelerators? For large-scale machine learning, TPU's have won, basically. There are new vendors like Tenstorrent offering completely new (and much simpler) computing hardware. GPU's may as well live on borrowed time as far as compute is concerned.

▲

LegNeato 5 days ago | parent [-]

Yes, see the proposed WG link I posted above. When I say GPU I'm using it as shorthand...indeed, I think the "graphics" part is on borrowed time and will just become fully software. It is already happening.

	▲	tucnak 5 days ago \| parent [-]
		Tenstorrent is often criticised for having lots of abstraction layers, compilers, IR's in the middle—it's all in C++, of course. GPU's are okay, but none of them got network-on-chip capability. Some promising papers have been coming out, like SystolicAttention, etc. There's just so much stuff for GPU's, but not that much for sysolic NoC systems (TPU, TT, NPU's) I think Rust could really make an impact here. Abandon all GPU deadweight, stick to simple abstractions, assume 3d twisted torus for topology and that's it. Food for thought!

▲

Ar-Curunir 6 days ago | parent | prev | next [-]

Rust expanded systems programming to a much larger audience. If it can do the same for GPU programming , even if the resulting programs are not (initially) as fast as CUDA programs, that’s a big win.

▲

tayo42 6 days ago | parent | prev [-]

What makes cuda nice in the first place?

▲

tucnak 6 days ago | parent [-]

All the things marked with red cross in the Rust-CUDA compatibility matrix.

https://github.com/Rust-GPU/Rust-CUDA/blob/main/guide/src/fe...

▲

dvtkrlbs 6 days ago | parent [-]

I mean that will improve only with time though, No? Maintainers recently revived the rust-gpu and rust-cuda backends. I don't think even the maintainer would say this is ready for prime time. Another benefit is having able to run the same code (library aka crate) on the CPU and GPU. This would require really good optimization to be done on the GPU backend to have the full benefits but I definitely see the value proposition and the potential.

▲

apitman 6 days ago | parent [-]

Those red Xs represent libraries that only work on Nvidia GPUs and would represent a massive amount of work to re-implement in a cross-platform way, and you may never achieve the same performance either because of abstraction or because you can't match the engineering resources Nvidia has thrown at this the past decade. This is their moat.

I think open source alternatives will come in time, but it's a lot.

▲

dvtkrlbs 5 days ago | parent | next [-]

I don't see any of the Xs that would not possible to generate code for and expose compiler intrisincs for. You don't reinvent the wheel here you generate the NVVM bytecode and let Nvidia handle the rest.

	▲	dvtkrlbs 5 days ago \| parent [-]
		Wait my browser scrolled to the wrong place. For libraries it is even easier to create or write bindings and like the status says several is already in progress.

▲

pjmlp 6 days ago | parent | prev [-]

Plus IDE integration, GPU graphical debugging with the same quality as Visual Studio debugging, polyglot ecosystem.

▲ hyperbolablabla 5 days ago | parent | prev | next [-]

What we really need is a consistent GPU ISA. If it wasn't for the fairly recent proliferation of ARM CPUs, we more or less would've rallied around x86 as the de facto ISA for general purpose compute. I'm not sure why we couldn't do the same for GPUs as well.

▲ littlestymaar 6 days ago | parent | prev | next [-]

Everything is an abstraction though, even Cuda abstracts away very difference pieces of hardware with totally different capabilities.

▲ rowanG077 6 days ago | parent | prev | next [-]

So what do you use? CUDA abstracts over the GPU hardware, opencl does, vulkan does. I guess you can write raw PTX?

▲ theknarf 5 days ago | parent | prev [-]

If only everyone could have agreed upon spir-v