| ▲ | rbanffy 6 days ago |
| I think the idea is to allow developers to write a single implementation and have a portable binary that can run on any kind of hardware. We do that all the time - there are lots of code that chooses optimal code paths depending on runtime environment or which ISA extensions are available. |
|
| ▲ | pjmlp 6 days ago | parent | next [-] |
| Without the tooling though. Commendable effort, however just like people forget languages are ecosystems, they tend to forget APIs are ecosystems as well. |
|
| ▲ | kookamamie 6 days ago | parent | prev [-] |
| Sure. The performance-purist in me would be very doubtful about the result's optimality, though. |
| |
| ▲ | littlestymaar 6 days ago | parent [-] | | The performance purist don't use Cuda either though (that's why Deepseek used PTX directly). Everything is an abstraction and choosing the right level of abstraction for your usecase is a tradeoff between your engineering capacities and your performance needs. | | |
| ▲ | LowLevelMahn 6 days ago | parent | next [-] | | this Rust demo also uses PTX directly During the build, build.rs uses rustc_codegen_nvvm to compile the GPU kernel to PTX.
The resulting PTX is embedded into the CPU binary as static data.
The host code is compiled normally.
| | |
| ▲ | LegNeato 6 days ago | parent | next [-] | | To be more technically correct, we compile to NVVM IR and then use NVIDIA's NVVM to convert it to PTX. | |
| ▲ | saagarjha 5 days ago | parent | prev [-] | | That’s not really the same thing; it compiles through PTX rather than using inline assembly. | | |
| |
| ▲ | brandonpelfrey 6 days ago | parent | prev [-] | | The issue in my mind is that this doesn’t seem to include any of the critical library functionality specific eg to NVIDIA cards, think reduction operations across threads in a warp and similar. Some of those don’t exist in all hardware architectures. We may get to a point where everything could be written in one language but actually leveraging the hardware correctly still requires a bunch of different implementations, ones for each target architecture. The fact that different hardware has different features is a good thing. | | |
| ▲ | rbanffy 6 days ago | parent [-] | | The features missing hardware support can fall back to software implementations. In any case, ideally, the level of abstraction would be higher, with little application logic requiring GPU architecture awareness. |
|
|
|