Remix.run Logo
trollbridge 4 hours ago

Right. Qwen 3.6 45b (6 parameter) runs on a commodity 5090, which, if you're into video games, you probably already have one of. It is entirely usable for most code generation tasks. (Not all, but most.)

Likewise, DeepSeek V4 Flash is quite accessible on local models, with DwarfStar 4 making it easy to run on a 96GB MacBook.

There's nothing wrong with paying for inference, but local models bring up some pretty amazing possibilities, such as entirely offline usage or being able to work on private PII, legally privileged, etc. sort of data, or performing tasks with no concern given whatsoever towards billing overruns.

The other possibility is being able to build a service which you can be 100% assured you can keep running without worrying about a service going down or being end-of-lifed, which is currently a problem with frontier models. My local Qwen setup is entirely predictable. It can run as long as I can keep finding hardware to run it.

A sensible strategy uses both: have local inference tools available, and use both low-cost and high-cost cloud based models. You can use GPT-5.5 and Opus-4.7 for things they excel at (including laundering the latter via a Claude subscription to make it cheaper) for demanding reasoning tasks, DeepSeek V4 Pro for slightly less demanding tasks, V4 Flash for most (not all) code generation, and then local models for things where you want a local model.