Remix.run Logo
graeme 3 days ago

We have a ton of good, small models. The issues are:

1. Most people don't have machines that can run even midsized local models well

2. The local models are nearly as good as the frontier models for a lot of use cases

3. There are technical hurdles to running local models that will block 99% of people. Even if the steps are: download LM Studio and download a model

Maybe local models will get so good that they cover 99% of normal user use cases and it'll be like using your phone/computer to edit a photo. But you'll still need something to make it automatic enough that regular people use it by default.

That said, anyone reading this is almost certainly technical enough to run a local model. I would highly recommend trying some. Very neat to know it's entirely run from your machine and seeing what it can do. LM Studio is the most brainless way to dip your toes in.

loyalcinnamon 2 days ago | parent | next [-]

As the hype is dying down it's becoming a little bit clearer that AI isn't like blockchain and might be actually useful (for non generative purposes at least)

I'm curious what counts as a midsize model; 4B, 8B, or something larger/smaller?

What models would you recommend? I have 12GB of vram so anything larger than 8B might be really slow, but i am not sure

riskable 2 days ago | parent | next [-]

My take:

Large: Requires >128GB VRAM

Medium: 32-128GB VRAM

Small: 16GB VRAM

Micro: Runs on a microcontroller or GPUs with just 4GB of VRAM

There's really nothing worthwhile for general use cases that runs in under 16GB (from my testing) except a grammar-checking model that I can't remember the name of at the moment.

gpt-oss:20b runs on 16GB of VRAM and it's actually quite good (for coding, at least)! Especially with Python.

Prediction: The day that your average gaming PC comes with 128GB of VRAM is the day developers will stop bothering with cloud-based AI services. gpt-oss:120b is nearly as good as gpt5 and we're still at the beginning of the AI revolution.

DSingularity 2 days ago | parent | prev [-]

It can depend on your use case. Are you editing a large code base and will thus make lots of completion requests with large contexts?

FitchApps 2 days ago | parent | prev [-]

Try WebLLM - it's pretty decent and all in-browser/offline even for light tasks, 1B-1.5B models like Qwen2.5-Coder-1.5B-Instruct. I put together a quick prototype - CodexLocal.com but you can essentially a local nginx and use webllm as an offline app. Of course, you can just use Ollama / LM Studio but that would require a more technical solution