Been running local LLMs on my 7900 XTX for months and the ROCm experience has been... rough. The fact that AMD is backing an official inference server that handles the driver/dependency maze is huge. My biggest question is NPU support - has anyone actually gotten meaningful throughput from the Ryzen AI NPU vs just using the dGPU? In my testing the NPU was mostly a bottleneck for anything beyond tiny models.

▲

mindcrime 4 hours ago | parent | next [-]

> Been running local LLMs on my 7900 XTX for months and the ROCm experience has been... rough.

Just out of curiosity... how so?

I only ask because I've been running local models (using Ollama) on my RX 7900 XTX for the last year and a half or so and haven't had a single problem that was ROCm specific that I can think of. Actually, I've barely had any problems at all, other than the card being limited to 24GB of VRAM. :-(

I'm halfway tempted to splurge on a Radeon Pro board to get more VRAM, but ... haven't bitten the bullet yet.

▲

lrvick 6 hours ago | parent | prev | next [-]

I have had way better perf with Vulcan than ROCm on kernel 7.0.0. They made some major improvements. 20%+ speedups for me.

▲

cl0ckt0wer 8 hours ago | parent | prev [-]

the npu is more for power efficiency when on battery. I don't think it's a replacement for gpu.

	▲	htrp 6 hours ago \| parent [-]
		what kind of tps slowdown would you realistically on an npu vs gpu?