| ▲ | nyrikki 3 hours ago | ||||||||||||||||
It is crazy to me that it is that slow, 4 bit quants don't lose much with Qwen3 coder next and unsloth/Qwen3-Coder-Next-UD-Q4_K_XL gets 32 tps with a 3090 (24gb) as a VM with 256k context size with llama.cpp Same with unsloth/gpt-oss-120b-GGUF:F16 gets 25 tps and gpt-oss20b gets 195 tps!!! The advantage is that you can use the APU for booting, and pass through the GPU to a VM, and have nice safer VMs for agents at the same time while using DDR4 IMHO. | |||||||||||||||||
| ▲ | lambda 2 hours ago | parent [-] | ||||||||||||||||
Yeah, this is an AMD laptop integrated GPU, not a discrete NVIDIA GPU on a desktop. Also, I haven't really done much to try tweaking performance, this is just the first setup I've gotten that works. | |||||||||||||||||
| |||||||||||||||||