I have been using lemonade for nearly a year already. On Strix Halo I am using nothing else - although kyuz0's toolboxes are also nice (https://kyuz0.github.io/amd-strix-halo-toolboxes/)

Nowadays you get TTS, STT, text & image generation and image editing should also be possible. Besides being able to run via rocm, vulkan or on CPU, GPU and NPU. Quite a lot of options. They have a quite good and pragmatic pace in development. Really recommend this for AMD hardware!

Edit: OpenAI and i think nowaday ollama compatible endpoints allow me to use it in VSCode Copilot as well as i.e. Open Web UI. More options are shown in their docs.

▲

UncleOxidant 4 hours ago | parent | next [-]

How much of a speedup might I get for, say, Qwen3.5-122B if I were to run with lemonade on my Strix Halo vs running it using vulkan with llama.cpp ?

	▲	sawansri 2 hours ago \| parent [-]
		You would get similar performance. Lemonade is designed as a turnkey (optimized for AMD Hardware) for local AI models. The software helps you manage backends (llama.cpp, flm, whispercpp, stable‑diffusion.cpp, etc) for different GenAI modalities from a single utility. On the performance side, lemonade comes bundled with ROCm and Vulkan. These are sourced from https://github.com/lemonade-sdk/llamacpp-rocm and https://github.com/ggml-org/llama.cpp/releases respectively.

▲

syntaxing 7 hours ago | parent | prev [-]

Have you used it with any agents or claw? If so, which model do you run?

▲

dennemark 7 hours ago | parent | next [-]

I have two Strix Halo devices at hand. Privately a framework desktop with 128gb and at work 64GB HP notebook. The 64GB machine can load Qwen3.5 30B-A3B, with VSCode it needs a bit of initial prompt processing to initialize all those tools I guess. But the model is fighting with the other resources that I need. So I am not really using it anymore these days, but I want to experiment on my home machine with it. I just dont work on it much right now.

Lemonade has a Web UI to set the context size and llama.cpp args, you need to set context to proper number or just to 0 so that it uses the default. If its too low, it wont work with agentic coding.

I will try some Claw app, but first need to research the field a bit. But I am using different models on Open Web UI. GPT 120B is fast, but also Qwen3.5 27B is fine.

▲

cpburns2009 7 hours ago | parent [-]

Qwen3-Coder-Next works well on my 128GB Framework Desktop. It seems better at coding Python than Qwen3.5 35B-A3B, and it's not too much slower (43 tg/s compared to 55 tg/s at Q4).

27B is supposed to be really good but it's so slow I gave up on it (11-12 tg/s at Q4).

▲

vlowther an hour ago | parent | next [-]

The 8 bit MLX unsloth quant of qwen3-coder-next seems to be a local best on an MBB M5 Max with 128GB memory. With oMLX doing prompt caching I can run two in parallel doing different tasks pretty reasonably. I found that lower quants tend to lose the plot after about 170k tokens in context.

	▲	cpburns2009 44 minutes ago \| parent [-]
		That's good to know. I haven't exceeded a 120k context yet. Maybe I'll bite the bullet and try Q6 or Q8. Any of coder-next quants larger than UD-Q4_K_XL take forever to load, especially with ROCm. I think there's some sort of autotuning or fitting going in llama.cpp.

▲

UncleOxidant 4 hours ago | parent | prev [-]

Agreed. Qwen3-coder-next seems like the sweetspot model on my 128GB Framework Desktop. I seem to get better coding results from it vs 27b in addition to it running faster.

▲

lrvick 4 hours ago | parent | prev [-]

As another data point.

Running Qwen3.5 122B at 35t/s as a daily driver using Vulcan llama.cpp on kernel 7.0.0rc5 on a Framework Desktop board (Strix Halo 128).

Also a pair of AMD AI Pro r9700 cards as my workhorses for zimageturbo, qwen tts/asr and other accessory functions and experiments.

Finally have a Radeon 6900 XT running qwen3.5 32B at 60+t/s for a fast all arounder.

If I buy anything nvidia it will be only for compatibility testing. AMD hardware is 100% the best option now for cost, freedom, and security for home users.

	▲	plagiarist 2 hours ago \| parent \| next [-]
		How is the performance for Z-Image on the R9700s?
	▲	syntaxing 2 hours ago \| parent \| prev [-]
		Are the dedicated GPU cards on another machine or you’re using eGPU with the framework?