Remix.run Logo
roenxi 4 days ago

I thought the point of something like Strix Halo was to avoid ROCm all together? AMDs strategy seems to have been to unify GPU/CPU memory then let people write their own libraries.

The industry looks like it's started to move towards Vulkan. If AMD cards have figured out how to reliably run compute shaders without locking up (never a given in my experience, but that was some time ago) then there shouldn't be a reason to use speciality APIs or software written by AMD outside of drivers.

ROCm was always a bit problematic, but the issue was if AMD card's weren't good enough for AMD engineers to reliably support tensor multiplication then there was no way anyone else was going to be able to do it. It isn't like anyone is confused about multiplying matricies together, it isn't for everyone but the naive algorithm is a core undergrad topic and the advanced algorithms surely aren't that crazy to implement. It was never a library problem.

SwellJoe 4 days ago | parent [-]

You misunderstand the point, and ROCm. The GPU and CPU share memory, that doesn't mean you don't need to interact with the GPU, anymore.

You can use Vulkan instead of ROCm on Radeon GPUs, including on the Strix Halo (and for a while, Vulkan was more likely to work on the Strix Halo, as ROCm support was slow to arrive and stabilize), but you need something that talks to the GPU.

Current ROCm, 7.2.1, works quite well on the Strix Halo. Vulkan does, too. ROCm tends to be a little faster, though. Not always, but mostly. People used to benchmark to figure out which was the best for a given model/workload, but now, I think most folks just assume ROCm is the better choice and use it exclusively. That's what I do, though I did find Gemma 4 wouldn't work on ROCm for a little bit after release (I think that was a llamma.cpp issue, though).

anaisbetts 4 days ago | parent | next [-]

This hasn't been my experience, ROCm is usually not only a bit slower for me (~32 t/s vs ~43 t/s on the main model I use), it is way less reliable; any upgrade in kernel version or AMD driver and suddenly everything is broken

SwellJoe 4 days ago | parent [-]

It can be tricky to get/keep ROCm working, but around 7.2 it became reliable and as fast as or faster than ROCm 6.4.

And, I think the first response time of ROCm is pretty consistently faster than Vulkan, even if Vulkan has a slightly higher token rate. Though I don't see that big of a different on token rates, either. Honestly, though, I haven't done enough real testing to know for sure. The benchmarks Donato Capitella posts (https://kyuz0.github.io/amd-strix-halo-toolboxes/) have been my guide on what to run in what way, and the performance of most things that can run on the Strix Halo are Fast Enough(tm) such that I don't agonize about performance. When Vulkan was all that worked with llama.cpp, that's what I used. Now that ROCm is reliable, I'm using ROCm. ROCm feels faster, maybe just because it processes prompts faster and starts typing the answer fast (at a rate faster than I can read it, so when it starts answering is the more important metric even if faster token rate would lead to it finishing faster).

In short: If ever I'm doing something that will take many hours to complete, and I need to optimize it, I'll do some tests first to be sure I'm using the optimal path. Otherwise, as long as ROCm is working, I'll probably just keep using it.

roenxi 4 days ago | parent | prev [-]

> The GPU and CPU share memory, that doesn't mean you don't need to interact with the GPU, anymore.

But we already have software that talks to the GPU; mesa3d and the ecosystem around that. It has existed for decades. My understanding was that the main reasons not to use it was that memory management was too complicated and CUDA solved that problem.

If memory gets unified, what is the value proposition of ROCm supposed to be over mesa3d? Why does AMD need to invent some new way to communicate with GPUs? Why would it be faster?

SwellJoe 4 days ago | parent | next [-]

> CUDA solved that problem.

CUDA is a proprietary Nvidia product. CUDA solved the problem for Nvidia chips.

On AMD GPUs, you use ROCm. On Intel, you use OpenVINO. On Apple silicon you use MLX. All work fine with all the common AI tasks you'd want to do on self-hosted hardware. CUDA was there first and so it has a more mature ecosystem, but, so far, I've found 0 models or tasks I haven't been able to use with ROCm. llama.cpp works fine. ComfyUI works fine. Transformers library works fine. LM Studio works fine.

Unless you believe Nvidia having a monopoly on inference or training AI models is good for the world, you can't oppose all the other GPU makers having a way for their chips to be used for those purposes. CUDA is a proprietary vendor-specific solution.

Edit: But, also, Vulkan works fine on the Strix Halo. It is reliable and usually not that much slower than ROCm (and occasionally faster, somehow). Here's some benchmarks: https://kyuz0.github.io/amd-strix-halo-toolboxes/

roenxi 4 days ago | parent | next [-]

Why? What is the point of focusing on something that seems to be a memory management solution when the memory management problem theoretically just went away?

That has been one of the big themes in GPU hardware since around 2010 era when AMD committed to ATI. Nvidia tried to solve the memory management problem in the software layer, AMD committed to doing it in hardware. Software was a better bet by around a trillion dollars so far, but if the hardware solutions have finally come to fruit then why the focus on ROCm?

SwellJoe 4 days ago | parent [-]

I dunno. GPU programming and performance is above my pay grade. I assume the reason every GPU maker is investing in software is because they understand the problems to be solved and feel it's worth the investment to solve them. I like AMD because their Linux drivers are open source. I like Intel because all their stuff is Open Source. I like Nvidia notably less because none of their stuff is Open Source, not even the Linux drivers.

sabedevops 4 days ago | parent | prev [-]

The problem with ROCm, unlike CUDA, is that it doesn’t run on much of AMDs own hardware, most notably their iGPU.

SwellJoe 4 days ago | parent [-]

Yeah, that kinda sucks, but, all their new generation onboard GPUs are supported by ROCm. e.g. Ryzen AI 395 and 400 series which will be found in mid-to-high end laptops and desktops and motherboards. They seem to have realized that the reason Nvidia is kicking their ass is that people can develop with CUDA on all sorts of hardware, including their personal laptop or desktop.

dragontamer 4 days ago | parent | prev [-]

> If memory gets unified, what is the value proposition of ROCm supposed to be over mesa3d? Why does AMD need to invent some new way to communicate with GPUs? Why would it be faster?

And the memory barriers? How do you sync up the L1/L2 cache of a CPU core with the GPU's cache?

Exactly. With a ROCm memory barrier, ensuring parallelism between CPU + GPU, while also providing a mechanism for synchronization.

GPU and CPU can share memory, but they do not share caches. You need programming effort to make ANY of this work.