| ▲ | sleepyeldrazi 4 hours ago | |||||||
If you want a good dense model, use qwen3.6 27B instead, speed will be up, and if you don't take my word for it being smarter, take openrouter's prices of it against the bigger, slower and less memory-efficient gemma do the talking. If you want a faster model, go for qwen3.6 35B (or gemma 4 26B if gemma models perform better for your tasks). There is a reason why people (myself included) haven't shut up about those two (especially the 27B). Its small enough to run at a decent speed (especially with the built in MTP that finally has official llama.cpp support) and for many workloads (every benchmark I have ever thrown at it) it is matching or surpassing models it has no right to. A couple of days ago I woke up with my internet being down, started 27B in pi, told it to diagnose whats wrong by giving it my router's password, went to grab a coffee and by the time I got back, i had a full report with suggestion on how to proceed. I love openrouter and I use it for many things, but it is not cheaper. Subjectivity and opinions based on personal experience with all those models implied naturally, I assume the 31B gemma has cases in which it edges out, I've just failed finding any and I have been running all 4 models mentioned since hours after each of them dropped nonstop for different tasks. Hell, for my hermes, I've started getting better results once I switched from gemma 4 26B to qwen3.5 9B, not even the massively improved 3.6 series. It just feels outdated/ cherrypicked to not use what by many accounts is the current consumer hardware SOTA if doing such an analysis. | ||||||||
| ▲ | ekojs 3 hours ago | parent | next [-] | |||||||
Not disagreeing with your argument, but: > If you want a good dense model, use qwen3.6 27B instead, speed will be up, and if you don't take my word for it being smarter, take openrouter's prices of it against the bigger, slower and less memory-efficient gemma do the talking. Don't know if this is the correct read. I think those providers are simply taking cue from Alibaba's first-party pricing for the 27B Dense. It's kinda overpriced imo. Perhaps it can be explained by how 'reasoning-inefficient' (relative to frontier models or even Gemma) the Qwen models are and longer sequence lengths are expensive to serve. | ||||||||
| ||||||||
| ▲ | trollbridge 4 hours ago | parent | prev [-] | |||||||
Right. Qwen 3.6 45b (6 parameter) runs on a commodity 5090, which, if you're into video games, you probably already have one of. It is entirely usable for most code generation tasks. (Not all, but most.) Likewise, DeepSeek V4 Flash is quite accessible on local models, with DwarfStar 4 making it easy to run on a 96GB MacBook. There's nothing wrong with paying for inference, but local models bring up some pretty amazing possibilities, such as entirely offline usage or being able to work on private PII, legally privileged, etc. sort of data, or performing tasks with no concern given whatsoever towards billing overruns. The other possibility is being able to build a service which you can be 100% assured you can keep running without worrying about a service going down or being end-of-lifed, which is currently a problem with frontier models. My local Qwen setup is entirely predictable. It can run as long as I can keep finding hardware to run it. A sensible strategy uses both: have local inference tools available, and use both low-cost and high-cost cloud based models. You can use GPT-5.5 and Opus-4.7 for things they excel at (including laundering the latter via a Claude subscription to make it cheaper) for demanding reasoning tasks, DeepSeek V4 Pro for slightly less demanding tasks, V4 Flash for most (not all) code generation, and then local models for things where you want a local model. | ||||||||