| ▲ | anaisbetts 4 days ago | |
This hasn't been my experience, ROCm is usually not only a bit slower for me (~32 t/s vs ~43 t/s on the main model I use), it is way less reliable; any upgrade in kernel version or AMD driver and suddenly everything is broken | ||
| ▲ | SwellJoe 4 days ago | parent [-] | |
It can be tricky to get/keep ROCm working, but around 7.2 it became reliable and as fast as or faster than ROCm 6.4. And, I think the first response time of ROCm is pretty consistently faster than Vulkan, even if Vulkan has a slightly higher token rate. Though I don't see that big of a different on token rates, either. Honestly, though, I haven't done enough real testing to know for sure. The benchmarks Donato Capitella posts (https://kyuz0.github.io/amd-strix-halo-toolboxes/) have been my guide on what to run in what way, and the performance of most things that can run on the Strix Halo are Fast Enough(tm) such that I don't agonize about performance. When Vulkan was all that worked with llama.cpp, that's what I used. Now that ROCm is reliable, I'm using ROCm. ROCm feels faster, maybe just because it processes prompts faster and starts typing the answer fast (at a rate faster than I can read it, so when it starts answering is the more important metric even if faster token rate would lead to it finishing faster). In short: If ever I'm doing something that will take many hours to complete, and I need to optimize it, I'll do some tests first to be sure I'm using the optimal path. Otherwise, as long as ROCm is working, I'll probably just keep using it. | ||