Remix.run Logo
cyanydeez 5 hours ago

Consumer hardware is there. grab a mac or AMD395+ and Qwen coder and Cline or Open code and you're getting 80% of the real efficiency.

smilekzs an hour ago | parent [-]

New Strix Halo (395+) user here. It is very librating to be able to "just" load the larger open-weight MoEs. At this param count class, bigger is almost always better --- my own vibe check confirms this, but obviously this is not going to be anywhere close to the leading cost-optimized closed-weight models (Flash / Sonnet).

The tradeoff with these unified LPDDR machines is compute and memory throughput. You'll have to live with the ~50 token/sec rate, and compact your prefix aggressively. That said, I'd take the effortless local model capability over outright speed any day.

Hope the popularity of these machines could prompt future models to offer perfect size fits: 80 GiB quantized on 128 GiB box, 480 GiB quantized on 512 GiB box, etc.