▲ | littlestymaar 6 days ago | ||||||||||||||||||||||
The performance purist don't use Cuda either though (that's why Deepseek used PTX directly). Everything is an abstraction and choosing the right level of abstraction for your usecase is a tradeoff between your engineering capacities and your performance needs. | |||||||||||||||||||||||
▲ | LowLevelMahn 6 days ago | parent | next [-] | ||||||||||||||||||||||
this Rust demo also uses PTX directly
| |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | brandonpelfrey 6 days ago | parent | prev [-] | ||||||||||||||||||||||
The issue in my mind is that this doesn’t seem to include any of the critical library functionality specific eg to NVIDIA cards, think reduction operations across threads in a warp and similar. Some of those don’t exist in all hardware architectures. We may get to a point where everything could be written in one language but actually leveraging the hardware correctly still requires a bunch of different implementations, ones for each target architecture. The fact that different hardware has different features is a good thing. | |||||||||||||||||||||||
|