| ▲ | mandevil 6 hours ago | ||||||||||||||||
Yeah, ROCm focused code will always beat generic code compiled down. But this is a really difficult game to win. For example, Deepseek R-1 released optimized for running on Nvidia HW, and needed some adaption to run as well on ROCm. This was for the exact same reasons that ROCm code will beat generic code compiled into ROCm, in the same way. Basically the Deepseek team, for their own purposes, created R-1 to fit Nvidia's way of doing things (because Nvidia is market-dominant) on their own. Once they released, someone like Elio or AMD would have to do the work of adapting the code to run best on ROCm. For more established players who weren't out-of-left-field surprises like Deepseek, e.g. Meta's Llama series, mostly coordinate with AMD ahead of release day, but I suspect that AMD still has to pay for the engineering work themselves while Meta does the work to make it run on Nvidia themselves. This simple fact, that every researcher makes their stuff work on CUDA themselves, but AMD or someone like Elio has to do the work to move it over to get it to be as performant on ROCm, that is what keeps people in the CUDA universe. | |||||||||||||||||
| ▲ | latchkey 2 hours ago | parent [-] | ||||||||||||||||
Kimi is the latest model that isn't running correctly on AMD. Apparently close to Deepseek in design, but different enough that it just doesn't work. It isn't just the model, it is the engine to run it. From what I understand this model works with sglang, but not with vLLM. | |||||||||||||||||
| |||||||||||||||||