| ▲ | porphyra 7 hours ago | |||||||
You can also run Qwen 3.6 27B dense model on DGX Spark with comparable performance [1][2] for about $4000 (Asus Ascent GX10 is $3999 at various retailers). In theory you can also get 48GB of VRAM with, say, two 3090s, but it will take up a lot of space and generate a lot of heat compared to the Macbook Pro and GB10. | ||||||||
| ▲ | esperent 7 hours ago | parent | next [-] | |||||||
> 48GB of VRAM with, say, two 3090s So like... $2000+ just for the used GPUs? Plus I assume it's considerably more effort to get it working. | ||||||||
| ||||||||
| ▲ | lee_ars 3 hours ago | parent | prev [-] | |||||||
The tweet you link shows "Qwen 3.6 35b NVFP4 - 256k ctx, 110 tok/s", but I'm getting only half that, around 50 tok/sec, on a DGX Spark with Qwen3.6-35B-A3B-NVFP4 (via vLLM) plus speculative decode w/EAGLE3. I'd be ecstatic to see 110 tok/sec and I wish they had some more sourcing for the exact config, because it's double what I'm getting. edit - after actually reading the tweets (had to use xcancel) and visiting the source git repo, switching to MTP for speculative decode makes things a hell of a lot faster, and the abliterated model plus dflash makes it even faster! I'm now seeing 70-90 tok/sec for most stuff. I like! | ||||||||
| ||||||||