Remix.run Logo
wizee 2 hours ago

Qwen 3.6 27B is quite good for agentic coding, and practical to run on consumer hardware. You need a system with either 32+ GB VRAM, or a unified memory system with 48+ GB VRAM and a decent integrated GPU. While not cheap, such a setup is still attainable for much of the world, and will eventually get cheaper over time. Open models hosted on non-American clouds also remain an option with a much lower barrier to entry, for cases where privacy is less critical.

jochem9 2 hours ago | parent | next [-]

There was an article on HN a few weeks ago where someone detailed how they managed to get an old datacenter GPU to run in their consumer PC, getting decent performance with qwen. He spent something like $200 on the GPU (second hand of course).

So yeah, I think models on local hardware will be quite common soon among the tech savvy (such as people creating software).

wrs an hour ago | parent [-]

Especially considering the millions of 2026-class data center GPUs that massively overinvested companies are currently buying, which will be obsolete in a few years.

treis an hour ago | parent | next [-]

I think those are going to be run until they die. The capex vs opex is too high to obsolete them in a few years. They'll keep serving current gen LLMs for as long as they keep running.

Chu4eeno 29 minutes ago | parent [-]

They can also be used for other things than running the main frontier whatever model as well.

E.g. grok isn't truly multi-modal, it has a callable tool that is a separate VLM it invokes on image URLs or files (for a long time it was grok-1.5v, but I think they have upgraded now, it was pretty bad).

And then you have the small summarizer models for the CoT/thought traces, the guidable summarizer models for the standard browse tools, etc.

There's a ton of stuff that can use an aging GPU.

nok22kon 15 minutes ago | parent | prev [-]

H100 were released in Oct 2022. They are now more expensive than at release time.

schmuhblaster 2 hours ago | parent | prev | next [-]

Indeed, and with some tinkering around the harness it can even punch way above its weight.

thewebguyd an hour ago | parent | prev [-]

> You need a system with either 32+ GB VRAM

I do hope you're right that it will get cheaper over time (it should), but right now 32GB of VRAM is not affordable to a lot of people. You're talking ~$4500 just for the GPU, or $800 ish used if you can find one.

daan-k an hour ago | parent [-]

For inference you can split the 32GB between two 16GB cards. Two new 5060tis for ~€1000 in total is more than fine.

It's a tad less efficient and a bit more of a hassle, but still a good experience for only a fraction of the price.

hrjejrnrn 42 minutes ago | parent [-]

[dead]