So far I've not run into the kind of use cases that local LLMs can convincingly provide without making me feel like I'm using the first ever ChatGPT from 2022, in that they are limited and quite limiting. I am curious about what use cases the community has found that work for them. The example that one user has given in this thread about their local LLM inventing a Sun Tzu interview is exactly the kind of limitation I'm talking about. How does one use a local LLM to do something actually useful?

▲

narrator 4 days ago | parent | next [-]

I have tried a lot of different LLMs and Gemma3:27b on a 48gb+ Macbook is probably the best for analyzing diaries and personal stuff you don't want to share with the cloud. The China models are comically bad with life advice. For example, I asked Deepseek to read my diaries and talk to me about my life goals and it told me in a very Confucian manner what the proper relationships in my life were for my stage of life and station in society. Gemma is much more western.

	▲	solardev 4 days ago \| parent \| next [-]
		Lol, that's actually kinda cool. Did you get any interesting Eastern responses to your diary entries? I'm imagining something like... > Dear diary, I got bullied again today, and the bread was stale in my PB&J :( >> My son, remember this: The one who mocks others wounds his own virtue. The one who suffers mockery must guard his heart. To endure without hatred is strength; to strike without cause is disgrace. The noble one corrects himself first, then the world will follow.
	▲	punitvthakkar 3 days ago \| parent \| prev \| next [-]
		That is fascinating. One insight I read about LLMs is that they do represent a world-view of the people who train it, and hence the country that ships the dominant LLM technology can spread widely its world-view on others. Your experience seems to validate that insight.
	▲	elorant 4 days ago \| parent \| prev [-]
		Chinese models are also awful with translations. Even the Deepseek R1 model performs worse than Mistral small.

▲

crazygringo 4 days ago | parent | prev | next [-]

I see local LLM's being used mainly for automation as opposed to factual knowledge -- for classification, summarization, search, and things like grammar checking.

So they need to be smart about your desired language(s) and all the everyday concepts we use in it (so they can understand the content of documents and messages), but they don't need any of the detailed factual knowledge around human history, programming languages and libraries, health, and everything else.

The idea is that you don't prompt the LLM directly, but your OS tools make use of it, and applications prompt it as frequently as they fetch URL's.

	▲	theshrike79 3 days ago \| parent [-]
		And local models are static, predictable and don't just go away when a new one comes out. This makes them perfect for automation tasks.

▲

dxetech 4 days ago | parent | prev | next [-]

There are situations where internet access is limited, or where there are frequent outages. An outdated LLM might be more useful than none at all. For example: my internet is out due to a severe storm, what safety precautions do I need to take?

	▲	theshrike79 3 days ago \| parent \| next [-]
		Just use Kiwix: https://kiwix.org/en/
	▲	punitvthakkar 3 days ago \| parent \| prev \| next [-]
		Yes - emergency use cases make tons of sense.
	▲	volemo 3 days ago \| parent \| prev [-]
		Surely not the ones you get from an LLM?

▲

vorticalbox 4 days ago | parent | prev | next [-]

I keep a lot of notes, all my thoughts feelings both happy and sad, things I’ve done, etc. in obsidian. These are deeply personal and I don’t want this going to a cloud provider even if they “say” they don’t train on my chats.

I forget a lot of things so I feed these into chromeDB and then use a LLM to chat with all my notes.

I’ve started using abliterated models which have their refusal removed [0]

Other use case is for work. I work with financial data and I have created an mcp that automates some of my job. Running model locally allows me to not worry about the information I feed it.

[0] https://github.com/Sumandora/remove-refusals-with-transforme...

▲

jondwillis 4 days ago | parent | prev | next [-]

I use, or at least try to use local models while prototyping/developing apps.

First, they control costs during development, which depending on what you're doing, can get quite expensive for low or no budget projects.

Second, they force me to have more constraints and more carefully compose things. If a local model (albeit something somewhat capable like gpt-oss or qwen3) can start to piece together this agentic workflow I am trying to model, chances are, it'll start working quite well and quite quickly if I switch to even a budget cloud model (something like gpt-5-mini.)

However, dealing with these constraints might not be worth the time if you can stuff all of the documents in your context window for the cloud models and get good results, but it will probably be cheaper and faster on an ongoing basis to have split the task up.

▲

dragonwriter 4 days ago | parent | prev | next [-]

Well, a lot of what is possible with local models depends on what your local hardware is, but docling is a pretty good example of a library that can use local models (VLMs instead of regular LLMs) “under the hood” for productive tasks.

▲

ivape 4 days ago | parent | prev | next [-]

I use Claude code in the terminal only mostly to figure out what to commit along with what to write for the commit message. I believe a solid 7-8b model can do this locally.

So, that’s at least one small highly useful workflow robot I have a use for (and very easy to cook up on your own).

I also have a use for terminal command autocompletion, which again, a small model can be great for.

Something felt kind really wrong about sending entire folder contents over to Claude online, so I am absolutely looking to create the toolkit locally.

The universe off offline is just getting started, and these big companies literally are telling you “watch out, we save this stuff”.

▲

rukuu001 4 days ago | parent | prev | next [-]

I'm running Gemma3-270M locally (MLX). I got a Python script that pulls down emails based on a whitelist and summarises them. The 270M model does a good job of this. This is running in a terminal. It means I barely look at my email during the day.

▲

ghilston 4 days ago | parent [-]

Any willingness to share this script? I've been working on some code to ingest things and summarize for them and I haven't gotten to email just yet.

▲

rukuu001 4 days ago | parent [-]

Watch this space. It’s pretty scrappy code and needs a cleanup. It also does other random stuff relating to calendar entries that I want to be reminded to appropriately prepare for.

But yes I’ll share, and I guess post an update in this thread?

	▲	renmillar 3 days ago \| parent \| next [-]
		I’d suggest the simple approach: run that script through Claude and have it extract just the email processing parts to create a clean CLI tool. This seems like exactly the type of refactoring task that LLMs are really good at.
	▲	ghilston 3 days ago \| parent \| prev [-]
		Okay will do. Yeah, I just finished my calendar reading code. I haven't prepared the data for my LLM ingestion yet. Sounds great, I'll refresh this thread in a few days

▲

luckydata 4 days ago | parent | prev | next [-]

Local models can do embedding very well, which is useful for things like building a screenshot manager for example.

	▲	punitvthakkar 3 days ago \| parent [-]
		Whoa. I didn't think about using embeddings for screenshot management. How would I do this?

▲

bityard 4 days ago | parent | prev | next [-]

I use a local LLM for lots of little things that I used to use search engines for. Defining words, looking up unicode symbols for copy/paste, reminders on how to do X in bash or Python. Sometimes I use it as a starting point for high-level questions and curiosity and then move to actual human content or larger online models for more details and/or fact-checking if needed.

If your computer is somewhat modern and has a decent amount of RAM to spare, it can probably run one of the smaller-but-still-useful models just fine, even without a GPU.

My reasons:

1) Search engines are actively incentivized to not show useful results. SEO-optimized clickbait articles contain long fluffy, contentless prose intermixed with ads. The longer they can keep you "searching" for the information instead of "finding" it, the better is for their bottom line. Because if you actually manage to find the information you're looking for, you close the tab and stop looking at ads. If you don't find what you need, you keep scrolling and generate more ad revenue for the advertisers and search engines. It's exactly the same reasons online dating sites are futile for most people: every successful match made results in two lost customers which is bad for revenue.

LLMs (even local ones in some cases) are quite good at giving you direct answers to direct questions which is 90% of my use for search engines to begin with. Yes, sometimes they hallucinate. No, it's not usually a big deal if you apply some common sense.

2) Most datacenter-hosted LLMs don't have ads built into them now, but they will. As soon as we get used to "trusting" hosted models due to how good they have become, the model developers and operators will figure out how to turn the model into a sneaky salesman. You'll ask it for the specs on a certain model of Dell laptop and it will pretend it didn't hear you and reply, "You should try HP's latest line of up business-class notebooks, they're fast, affordable, and come in 5 fabulous colors to suit your unique personal style!" I want to make sure I'm emphasizing that it's not IF this happens, it's WHEN.

Local LLMs COULD have advertising at some point, but it will probably be rare and/or weird as these smaller models are meant mainly for development and further experimentation. I have faith that some open-weight models will always exist in some form, even if they never rival commercially-hosted models in overall quality.

3) I've made peace with the fact that data privacy in the age of Big Tech is a myth, but that doesn't mean I can't minimize my exposure by keeping some of my random musings and queries to myself. Self-hosted AI models will never be as "good" as the ones hosted in datacenters, but they are still plenty useful.

4) I'm still in the early stages of this, but I can develop my own tools around small local models without paying a hosted model provider and/or becoming their product.

5) I was a huge skeptic about the overall value of AI during all of the initial hype. Then I realized that this stuff isn't some fad that will disappear tomorrow. It will get better. The experience will get more refined. It will get more accurate. It will consume less energy. It will be totally ubiquitous. If you fail to come to speed on some important new technology or trend, you will be left in the dust by those who do. I understand the skepticism and pushback, but the future moves forward regardless.

	▲	punitvthakkar 3 days ago \| parent [-]
		All totally valid points and insights. This is great, thank you!

▲

jeffybefffy519 4 days ago | parent | prev | next [-]

Gemma3 is pretty useful on a long haul flight without internet

▲

kristopolous 4 days ago | parent [-]

kimi v2 by moonshot. Give it a go

	▲	mhuffman 3 days ago \| parent [-]
		What Mac are you using that can run kimi k2?

▲

ActorNightly 3 days ago | parent | prev | next [-]

Smaller models require a lot more direction, a.k.a system prompt engineering, and sometimes custom wrappers . For example Gemma models are very eager to generate code even if you tell them not to.

▲

bigyabai 4 days ago | parent | prev | next [-]

Qwen3 A3B (in my experience) writes code as-good-as ChatGPT 4o and much better than GPT-OSS.

▲

hu3 4 days ago | parent [-]

I just tested Qwen3 A3B vs ChatGPE a random prompt from my head and:

> Please write a C# middleware to block requests from browser agents that contain any word in a specified list of words: openai, grok, gemini, claude.

I used ChatpGPT 4o from GitHub Copilot inside VSCode. And Qwen3 A3B from here: https://deepinfra.com/Qwen/Qwen3-30B-A3B

ChatGPT 4o was considerably better. Less verbose and less unnecessary abstractions.

▲

bigyabai 4 days ago | parent [-]

You want the 2507 update of the model, I think the one you used is ~8-10 months out-of-date.

	▲	jasonjmcghee 4 days ago \| parent [-]
		No they want Qwen3-Coder-30B-A3B-Instruct

▲

mentalgear 4 days ago | parent | prev | next [-]

something like rewind or openRecall can use local LLMs for on-device semantic search.

▲

segmondy 4 days ago | parent | prev [-]

The same way you use a cloud LLM.

▲

oblio 4 days ago | parent [-]

I think the point was that for example for programming, people perceive state of the art LLMs as being net positive contributors, at least for mainstream programming languages and tasks, and I guess local LLMs aren't net positive contributors (i.e. an experienced programmer can build the same thing at least as fast when using an LLM).

▲

segmondy 4 days ago | parent [-]

I know this is false, DeepSeekv3.1, GLM4.5, KimiK2-0905, Qwen-235B are all solid open models. Last night, I vibed rough 1300 lines of C server code in about an hour. 0 compilation error, ran without errors and got the job done. I want to meet this experienced programmer that can knock out 1300 lines of C code in an hour.

▲

drusepth 4 days ago | parent | next [-]

Are 235B models classified as local LLMs? I guess they probably are, but others in this thread are probably looking more toward 20B-30B models and sizes that generally fit on the RAM you'd expect in average or slightly-higher-end hardware.

My beefy 3D gamedev workstation with a 4090 and 128GB RAM can't even run a 235B model unless it's extremely quantized (and even then, only at like single-digit tokens/minute).

▲

codazoda 4 days ago | parent | prev | next [-]

How much machine do you have to be able to run Qwen-235B locally?

▲

oblio 4 days ago | parent | prev | next [-]

Can you run any of those models without $20 000 worth of hardware that uses as much power and makes as much noise as a small factory?

	▲	segmondy 2 days ago \| parent [-]
		I run them with under $3,000 hardware and inference is about 500-600watts with no noise.

▲

nomel 4 days ago | parent | prev | next [-]

Without knowing what you were doing with that 1300 lines of code, there's not much insight that can be had from this.

▲

brookst 3 days ago | parent | prev [-]

I’m a mediocre C programmer on my best day and I assure you a highly competent programmer could probably use 200 lines of code to do what I achieve in 1300.

Just counting lines is not a good proxy for how much effort it would take a good programmer.

(And I am 100% pro LLM coding, just saying this isn’t a great argument)