Remix.run Logo
SwellJoe 4 hours ago

I opted to buy a normal 32GB laptop for this very reason. I know how loud and hot the GPUs in my desktop run when running even smallish models like Qwen 27B or Gemma 4 31B (which is a better model for most than Qwen 3.6, despite the benchmarks). I also have a Strix Halo which doesn't get loud, because it has a single huge fan, but it does get hot. So, there's no way a laptop could work as hard as models make them work, and not be unbearable. Tiny fans trying to remove all that heat? They gotta be screaming. No reason to spend all that money on a laptop that I couldn't realistically make use of. I do run a lot of VMs on my desktop, but I can get to those on a VPN.

It's a nice idea to run a model on a laptop so you can work anywhere...but, that's a job for models in the cloud. Not much data has to traverse the network, so it's not a big deal. Or one could also setup a VPN so you can reach a self-hosted model on a big box at home for things that require data privacy.

All that said, there are models that work great on very small devices for some tasks and won't work it to death. Gemma 4 12B QAT 4-bit runs on a 16GB device, maybe even smaller, including a tablet. It's the best self-hostable vision model I've tested for my purposes (categorization, identification, labeling, type stuff), beating much larger models. It's also a decent conversationalist with good prose but it doesn't know much of anything (not a lot of the world fits in 7GB), so it needs search if you want to use it for research. It's a pretty good tool user. I definitely wouldn't want to use it for code, though, beyond very simple stuff.

girvo 3 hours ago | parent [-]

Gemma is better than Qwen at everything except coding, in all my evaluations. Which is a shame because that is what I use them for!

UncleOxidant 2 hours ago | parent | next [-]

It would be great if the Gemma folks would release a code-focused model. Probably won't happen, but it's fun to dream.

SwellJoe an hour ago | parent [-]

The Ornith folks say they're doing that, but haven't released the Gemma-based 31b yet (https://github.com/deepreinforce-ai/Ornith-1). But, also, the Qwen-based 35b MoE Ornith version performs worse than Qwen 3.6 and Qwen AgentWorld on my benchmarks (which are focused on finding security bugs, so not exactly the same as agentic coding, but closely related skills).

That said, the reason they're able to release Ornith branded post-trains of both Gemma and Qwen is because they're open weights under a friendly license. Someone, not just Google, could make a coding focused Gemma post-train. I don't think it's actually much weaker than Qwen 3.6 for coding; Gemma 4 31b outperforms Qwen 3.6 27b by a wide margin on security bug hunting (at least for the specific bugs in my benchmarks, which are mostly relatively difficult bugs from the Mythos-reported bugs).

I'd really love to see a bigger MoE from Google, though. A 70b or 120b MoE would likely be super fun.

ekianjo 34 minutes ago | parent | prev | next [-]

gemma is also worse for tool calling. not just coding

an hour ago | parent | prev [-]
[deleted]