Remix.run Logo
cowmix 2 hours ago

If you don't mind saying, what distro and/or Docker container are you using to bet Qwen3 Coder Next going?

nyrikki an hour ago | parent | next [-]

I can't answer for the OP but it works fine under llama.cpp's container.

lambda an hour ago | parent | prev [-]

I'm running Fedora Silverblue as my host OS, this is the kernel:

  $ uname -a
  Linux fedora 6.18.9-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Feb  6 21:43:09 UTC 2026 x86_64 GNU/Linux
You also need to set a few kernel command line paramters to set it up to allow it to use most of your memory as graphics memory, I have the following in my kernel command line, those are each 110 GiB expressed in number of pages (I figure leaving 18 GiB or so for CPU memory is probably a good idea):

  ttm.pages_limit=28835840 ttm.page_pool_size=28835840
Then I'm running llama.cpp in the official llama.cpp Docker containers. The Vulkan one works out of the box. I had to build the container myself for ROCm, the llama.cpp container has ROCm 7.0 but I need 7.2 to be compatible with my kernel. I haven't actually compared the speed directly between Vulkan and ROCm yet, I'm pretty much at the point where I've just gotten everything working.

In a checkout of the llama.cpp repo:

  podman build -t llama.cpp-rocm7.2 -f .devops/rocm.Dockerfile --build-arg ROCM_VERSION=7.2 --build-arg ROCM_DOCKER_ARCH='gfx1151' .
Then I run the container with something like:

  podman run -p 8080:8080 --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined --security-opt label=disable --rm -it -v ~/.cache/llama.cpp/:/root/.cache/llama.cpp/ -v ./unsloth:/app/unsloth llama.cpp-rocm7.2  --model unsloth/MiniMax-M2.5-GGUF/UD-Q3_K_XL/MiniMax-M2.5-UD-Q3_K_XL-00001-of-00004.gguf --jinja --ctx-size 16384 --seed 3407 --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40 --port 8080 --host 0.0.0.0 -dio
Still getting my setup dialed in, but this is working for now.

Edit: Oh, yeah, you had asked about Qwen3 Coder Next. That command was:

  podman run -p 8080:8080 --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined --security-opt label=disable \
    --rm -it -v ~/.cache/llama.cpp/:/root/.cache/llama.cpp/ -v ./unsloth:/app/unsloth llama.cpp-rocm7.2  -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q6_K_XL \
    --jinja --ctx-size 262144 --seed 3407 --temp 1.0 --top-p 0.95 --min-p 0.01 --top-k 40 --port 8080 --host 0.0.0.0 -dio
(as mentioned, still just getting this set up so I've been moving around between using `-hf` to pull directly from HuggingFace vs. using `uvx hf download` in advance, sorry that these commands are a bit messy, the problem with using `-hf` in llama.cpp is that you'll sometimes get surprise updates where it has to download many gigabytes before starting up)