Remix.run Logo
bigyabai 2 hours ago

I think you misunderstand what's fundamentally possible with AMD's architecture. They can't wave a magic wand for a CUDA compatibility layer any better than Apple or Qualcomm can, it's not low-hanging fruit like DirectX or Win32 translation. Investing billions into translating CUDA on raster GPUs is a dead end.

AMD's best option is a greenfield GPU architecture that puts CUDA in the crosshairs, which is what they already did for datacenter customers with AMD Instinct.

KeplerBoy 2 hours ago | parent | next [-]

This is a big part of AMD still not having a proper foothold in the space: AMD Instinct is quite different from what regular folks can easily put in their workstation. In Nvidia-land I can put anything from mid-range gaming cards, over a 5090 to an RTX 6000 Pro in my machine and be confident that my CUDA code will scale somewhat acceptably to a datacenter GPU.

bigyabai 2 hours ago | parent [-]

This is where I feel like Khronos could contribute, making a Compute Capability-equivalent hardware standard for vendors to implement. CUDA's versioning of hardware capabilities plays a huge role in clarifying the support matrix.

...but that requires buy-in from the rest of the industry, and it's doubtful FAANG is willing to thread that needle together. Nvidia's hedged bet against industry-wide cooperation is making Jensen the 21st century Mansa Musa.

colordrops an hour ago | parent | prev [-]

I do not misunderstand.

Let's say you put 50-100 seasoned devs on the problem, and within 2-3 years, probably get ZLUDA to the point where most mainstream CUDA applications — ML training/inference, scientific computing, rendering — run correctly on AMD hardware at 70-80% of the performance you'd get from a native ROCm port. Even if its not optimal due to hardware differences, it would be genuinely transformative and commercially valuable.

This would give them runway for their parallel effort to build native greenfield libraries and toolkits and get adoption, and perhaps make some tweaks to future hardware iterations that make compatibility easier.