Remix clone Hacker News

new | show | ask | jobs Github

	▲	Aurornis 4 hours ago
		At what quantization and with what size context window?
	▲	GrayShade 2 hours ago \| parent [-]
		Looks like it's a bit slower today. Running llama.cpp b8192 Vulkan. $ ./llama-cli unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf -c 65536 -p "Hello" [snip 73 lines] [ Prompt: 86,6 t/s \| Generation: 34,8 t/s ] $ ./llama-cli unsloth_Qwen3.5-35B-A3B-GGUF_Qwen3.5-35B-A3B-UD-Q4_K_XL.gguf -c 262144 -p "Hello" [snip 128 lines] [ Prompt: 78,3 t/s \| Generation: 30,9 t/s ] I suspect the ROCm build will be faster, but it doesn't work out of the box for me.