It’s more than just data locality. OpenRouter is faster, no? I have an M4 pro, and anything but the smallest dumbest models are unusably slow for interactive use. I personally haven’t yet found a good use case for offline/non-interactive LLM work locally.

▲

PAndreew 5 minutes ago | parent | next [-]

I’m running a local Whisper + Gemma 4 pipeline with a cheap USB mic to extract health related data and potential todos from ambient speech. It doesn’t have to be fast doesn’t have to be 100% correct because if it captures at least a few bits of interesting information that would otherwise go unnoticed it’s still a win.

▲

novok 23 minutes ago | parent | prev | next [-]

I played with classifying and summarizing my entire email history (per email) with small models, but that only took about 12h of GPU time at most. Using a coding agent cli wrapper in that case is far slower because of all the spin up cost and the system prompt they inject even if you want to turn it all off.

If I used an actual direct API it probably would've been much faster, but I'm doing it for hobby / fun reasons. You also get to fiddle with a lot more params.

▲

datadrivenangel 4 hours ago | parent | prev | next [-]

Yeah. The speed is the biggest issue. The intelligence of open models is good enough for serious work (though still worse than the frontier models), but the cloud models are often 3-7 times faster, and you can get more parallelization and so get speeds on the order of hundreds of tokens per second, which makes things fast!

▲

freeopinion 2 hours ago | parent [-]

Even extremely slow LLMs can generate Part B faster than I can audit Part A. So the LLM can generate Part A while I look over my email. Then it can worry over Part B while I look over Part A.

It can worry over Part C while I have my 10:30 group meet. And it can worry over Part D while I do whatever other silly, time-wasting thing all humans do in almost all organizations. Then I still haven't reviewed Part B, yet, so the extremely slow AI is waiting on me.

Maybe someday I'll be good enough to need faster AI so I can rewrite something like Bun in a few days. Right now, slow and local fits my use case very well.

	▲	quietsegfault an hour ago \| parent [-]
		I don’t think it matters if you’re “good enough” or not. Much of AI development is iterative. If you context switch between A from project 1 to B from project 2 back to check A, then maybe C while B finishes up, you will lose the flow state that AI assistance can enable with speed for those who are not fluent coders. Sure, I can wait hours for my local model to finish, or I can spend basically as much and get the answer right away There’s a lot of exciting stuff with local LLMs despite the speed, but for me I don’t have the discipline and working memory to jump from project to project.

▲

threatofrain 2 hours ago | parent | prev [-]

And continuing the argument of "more than just...", if you stopped inferencing on your Mac you still have a generally nice computer. The difference between rent vs buy.