▲ | spmurrayzzz 3 days ago | |||||||||||||||||||||||||
CUDA isn't the moat people think it is. NVIDIA absolutely has the best dev ergonomics for machine learning, there's no question about that. Their driver is also far more stable than AMD's. But AMD is also improving, they've made some significant strides over the last 12-18 months. But I think more importantly, what is often missed in this analysis is that most programmers doing ML work aren't writing their own custom kernels. They're just using pytorch (or maybe something even more abstracted/multi-backend like keras 3.x) and let the library deal with implementation details related to their GPU. That doesn't mean there aren't footguns in that particular land of abstraction, but the delta between the two providers is not nearly as stark as its often portrayed. At least not for the average programmer working with ML tooling. (EDIT: also worth noting that the work being done in the MLIR project has a role to play in closing the gap as well for similar reasons) | ||||||||||||||||||||||||||
▲ | martinpw 3 days ago | parent [-] | |||||||||||||||||||||||||
> But I think more importantly, what is often missed in this analysis is that most programmers doing ML work aren't writing their own custom kernels. They're just using pytorch (or maybe something even more abstracted/multi-backend like keras 3.x) and let the library deal with implementation details related to their GPU. That would imply that AMD could just focus on implementing good PyTorch support on their hardware and they would be able to start taking market share. Which doesn't sound like much work compared with writing a full CUDA competitor. But that does not seem to be the strategy, which implies it is not so simple? I am not an ML engineer so don't have first hand experience, but those I have talked to say they depend on a lot more than just one or two key libraries. But my sample size is small. Interested in other perspectives... | ||||||||||||||||||||||||||
|