llama.cpp and lmstudio have a Vulkan backend which is pretty fast. I'm using it to run models on a Strix Halo laptop and it works pretty well.