An R9700 is $1350 and can get 100 TPS running Qwen3.6-35B-A3B Q5 with 130k context window (with room to spare) with a bit of fine tuning llamacpp-vulkan, but llamacpp's repository instability and lack of real versioning frustrates me.

In terms of electricity, if you aren't using it, even with all the vram loaded, at most your wasting about 30 watts or so.

Prompt processing a large uncached context is annoying, which is why I forced a lower context window, but I don't know if it's any worse in performance than the cloud models I've used.

There's a niceness, to me, knowing I don't have to rent it anymore. If you rent it, the terms can change regularly.

▲

rsync 2 hours ago | parent | next [-]

"An R9700 is $1350 and can get 100 TPS running Qwen3.6-35B-A3B Q5 with 130k context window ..."

How would that change (improve) if you had two R9700 in a similar configuration ?

	▲	vardalab an hour ago \| parent [-]
		better prompt processing like 1.5x+ and more kv but tg most likely lower like 0.8x or so but I am just going by memory for Qwen3.5 without mtp.

▲

bertili 5 hours ago | parent | prev [-]

Qwen 27b is a compute heavy dense model.