Remix.run Logo
dd8601fn 21 hours ago

I’ve heard the same about Nvidia, quite a few times, but have never really understood it.

I don’t suppose you know a good “for dummies” explanation of why CUDA is such an insurmountable moat for them?

Like, what is it about that software that AMD can’t produce for their own hardware, or for a most important subset, with these $1T market stakes?

ecshafer 19 hours ago | parent | next [-]

CUDA is a GPGPU software layer that is very mature, and integrates with C,C++, Python, Fortran very well. AMD just never really got the same quality of GPGPU software in the last 20 years. 99% of scientific computing that uses GPUs (which is a lot since they are so much faster than CPUs for linear algebra) have gone to Nvidia because of this. All of the big AI libraries (Tensor Flow, PyTorch) basically ended up writing around CUDA, so they just didn't write things for AMD. Plus if you go and look at a job for signal processing or whatever at say Lockheed Martin or Raytheon, they specific CUDA.

throwup238 21 hours ago | parent | prev | next [-]

> I don’t suppose you know a good “for dummies” explanation of why CUDA is such an insurmountable moat for them?

Theoretically the moat isn’t insurmountable and AMD has made some inroads thanks to the open source community but in practice a generic CUDA layer requires a ton of R&D that AMD hasn’t been able to afford since the ATI acquisition. It’s been fighting for its existence for most of that time and just never had the money to invest in catching up to NVIDIA beyond the hardware. Even something as seemingly simple as porting the BLAS library to CUDA is a significant undertaking that has to validate numerical codes while dealing with floating point subtleties. The CPU versions of these libraries are so foundational and hard to get right that they’re still written in FORTRAN and haven’t changed much in decades. Everything built on top of those libraries then requires having customers who can help you test and profile real code in use. When people say that software isn’t a moat they’re talking about basic CRUD over a business domain where all it takes is a competent developer and someone with experience in the industry to replicate. CUDA is about as far from that as you can get in software without stepping on Mentor Graphics’ or Dassault’s toes.

There’s a second factor which is that hardware companies tend to have horrible software cultures, especially when silicon is the center of gravity. The hardware guys in leadership discount the value of software and that philosophy works itself down the hierarchy. In this respect NVIDIA is very much an outlier and it shows in CUDA. Their moat isn’t just the software but the organization that allowed it to flourish in a hardware company, which predates their success in AI (NVIDIA has worked with game developers for decades to optimize individual games).

franktankbank 19 hours ago | parent [-]

Maybe nobody has reputably released non-fortran versions but they probably exist.

throwup238 16 hours ago | parent [-]

Lots of other versions exist including reputable ones like Intel’s MKL. The hard part isn’t reimplementing it, it’s validating the output across a massive corpus of scientific work.

BLAS is an example though, it’s the tip of an iceberg.

DaedalusII 21 hours ago | parent | prev | next [-]

the first problem is a whole generation of people learned to code ai applications by fiddling around with the gpu in their gaming pc 10 years ago. so an entire generation of talent grew up with cuda

the second problem is that so many libraries and existing software is cuda only. even some obscure hardware stuff. i discovered the hard way that some AMD thinkpads dont support thunderbolt transfer speeds on their usb-c ports, whereas nvidia ones do

the third problem is that the cost to develop a cuda equivalent is so great that its cheaper for companies like google to make TPU and amazon to make Trainium. its literally cheaper to make an entire new chipset than it is to fix AMd. i dont see companies like apple/amzn/goog etc fixing AMDs chips

staticman2 19 hours ago | parent [-]

>its literally cheaper to make an entire new chipset than it is to fix AMd

Is it? Or does AMD expect to make a profit and it's cheaper to make your own chips at cost?

DaedalusII 10 hours ago | parent [-]

i mean its cheaper from an enterprise customer perspective. if a company is training an LLM, writing their training programs to use AMDs hardware instead of just using CUDA is so expensive and time consuming that it is cheaper to pay four times the price and use nvidia hardware. in this space its important to move fast, although that economic will shift over time

which is why nvidia hardware trades at a 4x premium to AMD

its not necessarily cheaper to make chips at cost either. nobody is making them, only designing them. so first you have to design your new chip, then you have to get a minimum order in with the chip fab so big it competes on unit economics, and then finally you have to get your dev team to write a CUDA equivalent software that is a problem so hard its only really been solved by apple, google, intel, and nvidia

only companies with big fab orders can get priority too.. if a company did all of the above and was ready to go, they probably wouldn't get fab capacity until 2030

surgical_fire 21 hours ago | parent | prev [-]

My understanding on this may be spotty (and I appreciate it if someone corrects me), but CUDA is not the software layer that allows you to use NVIDIA GPUs for AI processing?

AMD may develop their own software layer, but a lot of things already work on CUDA, and the job to port this to a different platform may be non-trivial (or even possible depending on the level of feature parity).