LM Studio 0.4.0

▲ LM Studio 0.4.0(lmstudio.ai)

58 points by jiqiren 2 hours ago | 28 comments

▲ ssalka 16 minutes ago | parent | next [-]

Personally, I would not run LM Studio anywhere outside of my local network as it still doesn't support adding an SSL cert. I guess you can just layer a proxy server on top of it, but if it's meant to be easy to set up, it seems like a quick win that I don't see any reason not to build support for.

https://github.com/lmstudio-ai/lmstudio-bug-tracker/issues/1...

	▲	dmd a minute ago \| parent \| next [-]
		Adding Caddy as a proxy server is literally one line in Caddyfile, and I trust Caddy to do it right once more than I trust every other random project to add SSL.
	▲	jermaustin1 10 minutes ago \| parent \| prev [-]
		Because adding a caddy/nginx/apache + letsencrypt is a couple of bash commands between install and setup, and those http servers + TLS termination is going to be 100x better than what LMS adds themselves, as it isn't their core competency.

▲ syntaxing an hour ago | parent | prev | next [-]

I’m really excited for lmster and to try it out. It’s essentially what I want from ollama. Ollama has deviated so much from their original core principles. Ollama has been broken and slow to update model support. There’s this “vendor sync” I’ve been waiting (essentially update ggml) for weeks.

▲ minimaxir an hour ago | parent | prev | next [-]

LMStudio introducing a command line interface makes things come full circle.

▲

Helithumper an hour ago | parent [-]

For context, LMStudio has had a CLI for a while it just required the desktop app to be open already. This makes it where you can run LMStudio properly headless and not just from a terminal while the desktop app is open.

`lms chat` has existed, `lms daemon up` / "llmster" is the new command.

	▲	embedding-shape 33 minutes ago \| parent [-]
		> This makes it where you can run LMStudio properly headless and not just from a terminal while the desktop app is open Ah, this is great, been waiting for this! I naively created some tooling on top of the API from the desktop app after seeing they had a CLI, then once I wanted to deploy and run it on a server, I got very confused that the desktop app actually installs the CLI and it requires the desktop app running. Great that they finally got it working fully headless now :)

▲ saberience an hour ago | parent | prev | next [-]

What’s the main use-case for this?

I get that I can run local models, but all the paid for (remote) models are superior.

So is the use-case just for people who don’t want to use big tech’s models? Is this just for privacy conscious people? Or is this just for “adult” chats, ie porn bots?

Not being cynical here, just wanting to understand the genuine reasons people are using it.

▲

biddit 37 minutes ago | parent | next [-]

Yes, frontier models from the labs are a step ahead and likely will always be, but we've already crossed levels of "good enough for X" with local models. This is analogous to the fact that my iPhone 17 is technically superior to my iPhone 8, but my outcomes for text messaging are no better.

I've invested heavily in local inference. For me, it's a mixture privacy, control, stability, cognitive security.

Privacy - my agents can work on tax docs, personal letters, etc.

Control - I do inference steering with some projects: constraining which token can be generated next at any point in time. Not possible with API endpoints.

Stability - I had many bad experiences with frontier labs' inference quality shifting within the same day, likely due to quantization due to system load. Worse, they retire models, update their own system prompts, etc. They're not stable.

Cognitive Security - This has become more important as I rely more on my agents for performing administrative work. This is intermixed with the Control/Stability concerns, but the focus is on whether I can trust it to do what I intended it to do, and that it's acting on my instructions, rather than the labs'.

▲

konart 30 minutes ago | parent | prev | next [-]

For many tasks you don't really need big models. And relatively small model, quantized too can be run on your macbook (not to mention Mac studio).

▲

reactordev an hour ago | parent | prev | next [-]

Not always. Besides, this allows one to use a post-trained model, a heretic model, an abliterated model, or their own.

I exclusively run local models. On par with Opus 4.5 for most things. gpt-oss is pretty capable. Qwen3 as well.

▲

tiderpenger an hour ago | parent | prev | next [-]

To justify investing a trillion dollars like everything else LLM-related. The local models are pretty good. Like I ran a test on R1 (the smallest version) vs Perplexity Pro and shockingly got better answers running on base spec Mac Mini M4. It's simply not true that there is a huge difference. Mostly it's hardcoded overoptimalization. In general these models aren't really becoming better.

▲

mk89 33 minutes ago | parent [-]

I agree with this comment here.

For me the main BIG deal is that cloud models have online search embedded etc, while this one doesn't.

However, if you don't need that (e.g., translate, summarize text, writing code) probably is good enough.

▲

prophesi 23 minutes ago | parent [-]

So long as the local model supports tool-use, I haven't had issues with them using web search etc in open-webui. Frontier models will just be smarter in knowing when to use tools.

	▲	mk89 13 minutes ago \| parent [-]
		Ok I need to explore this, I didn't do it yet. Thanks.

▲

anonym29 33 minutes ago | parent | prev [-]

TL;DR: The classic CIA triad: Confidentiality, Integrity, Availability; cost/price concerns; the leading open-weight models aren't nearly as bad as you might think.

You don't need LM Studio to run local models, it just (was, formerly), a nice UI to download and manage HF models and llama.cpp updates, quickly and easily manually switch between CPU / Vulkan / ROCm / CUDA (depending on your platform).

Regarding your actual question, there are several reasons.

First off, your allusion to privacy - absolutely, yes, some people use it for adult role-play, however, consider the more productive motivations for privacy, too: a lot of businesses with trade secrets they may want to discuss or work on with local models without ever releasing that information to cloud providers, no matter how much those cloud providers pinky promise to never peek at it. Google, Microsoft, Meta, et al have consistently demonstrated that they do not value or respect customer privacy expectations, that they will eagerly comply with illegal, unconstitutional NSA conspiracies to facilitate bulk collection of customer information / data. There is no reason to believe Anthropic, OpenAI, Google, xAI would act any differently today. In fact, there is already a standing court order forcing OpenAI to preserve all customer communications, in a format that can be delivered to the court (i.e. plaintext, or encryption at rest + willing to provide decryption keys to the court), in perpetuity (https://techstartups.com/2025/06/06/court-orders-openai-to-p...)

There are also businesses which have strict, absolute needs for 24/7 availability and low latency, which remote APIs never have offered. Even if the remote APIs were flawless, and even if the businesses have a robust multi-WAN setup with redundant UPS systems, network downtime or even routing issues are more or less an inevitable fact of life, sooner or later. Having local models means you have inference capability as long as you have electricity.

Consider, too, the integrity front: frontier labs may silently modify API-served models to be lower quality for heavy users with little means of detection by end users (multiple labs have been suspected / accused of this; a lack of proof isn't evidence that it didn't happen) or that the API-served models can be modified over time to patch behaviors that may have been previously relied upon for legitimate workloads (imagine a red team that used a jailbreak to get a model to produce code for process hollowing, for instance). This second example absolutely has happened with almost every inference provider.

The open weight local models also have zero marginal cost besides electricity once the hardware is present, unlike PAYG API models, which create financial lock-in and dependency that is in direct contrast with the financial interests of the customers. You can argue about the amortized costs of hardware, but that's a decision for the customer to make using their specific and personal financial and capex / hardware information that you don't have at the end of the day.

Further, the gap between frontier open weight models and frontier proprietary models has been rapidly shrinking and continues to. See Kimi K2.5, Xiaomi MiMo v2, GLM 4.7, etc. Yes, Opus 4.5, Gemini 3 Pro, GPT-5.2-xhigh are remarkably good models and may beat these at the margin, but most work done via LLMs does not need the absolute best model; many people will opt for a model that gets 95% of the output quality of the absolute frontier model when it can be had for 1/20th the cost (or less).

▲ huydotnet 35 minutes ago | parent | prev | next [-]

I was hoping for the /v1/messages endpoint to use with Claude Code without any extra proxies :(

	▲	anonym29 25 minutes ago \| parent [-]
		This is a breeze to do with llama.cpp, which has had Anthropic responses API support for over a month now. On your inference machine: `you@yourbox:~/Downloads/llama.cpp/bin$ ./llama-server -m <path/to/your/model.gguf> --alias <your-alias> --jinja --ctx-size 32768 --host 0.0.0.0 --port 8080 -fa on` Obviously, feel free to change your port, context size, flash attention, other params, etc. Then, on the system you're running Claude Code on: `export ANTHROPIC_BASE_URL=http://<ip-of-your-inference-system>:<port> export ANTHROPIC_AUTH_TOKEN="whatever" export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 claude --model <your-alias> [optionally: --system "your system prompt here"]` Note that the auth token can be whatever value you want, but it does need to be set, otherwise a fresh CC install will still prompt you to login / auth with Anthropic or Vertex/Azure/whatever.

▲ thousand_nights 43 minutes ago | parent | prev | next [-]

man they really butchered the user interface, the "dark" mode now isn't even dark, it's just grey, and it's looking more like a whitespacemaxxed children's toy than a tool for professionals

	▲	konart 33 minutes ago \| parent [-]
		Right now it looks like as VS Code (give or take). Pretty sure both are\will be used by many professionals. "looks like a toy" has very little to do with its use anyway.

▲ jiqiren 2 hours ago | parent | prev | next [-]

This release introduces parallel requests with continuous batching for high throughput serving, all-new non-GUI deployment option, new stateful REST API, and a refreshed user interface.

	▲	observationist an hour ago \| parent [-]
		Awesome - having the API, MCP integrations, refined CLI give you everything you might want. I have some things I'd wanted to try with ChainForge and LMStudio that are now almost trivial. Thanks for the updates!

▲ khimaros 18 minutes ago | parent | prev | next [-]

this is not open source

	▲	adastra22 14 minutes ago \| parent [-]
		What’s the best open source alternative?

▲ behnamoh 34 minutes ago | parent | prev | next [-]

lmster is what was lacking in lmstudio (yes, they have lms but it lacks so many functionalities that the GUI version has).

but it's a bit too little too late. people running this probably can already setup llama.cpp pretty easily.

lmstudio also has some overhead like ollama; llama.cpp or mlx alone are always faster.

▲ anonym29 an hour ago | parent | prev [-]

1. Can't select between Vulkan or ROCm anymore.

2. Cannot manage or see Vulkan or ROCm versions / updates anymore.

3. This breaks a familiar UI to introduce a worse one with missing features.

This kills the value proposition for me, and reeks of enshittification - hiding important features used by power users and devs, transitioning towards a consumerized product rather than a tool, not unlike what happened to ollama.

Back to plain llama.cpp I go. I will not be recommending LMS to anyone until these are fixed.

▲

nunodonato 40 minutes ago | parent [-]

woah dude, take it easy. There are no missing features, there are more feature. You might just not be finding them where they were before. Remember this is still 0.x, why would the devs be stuck and not be able to improve the UI just because of past decisions?

	▲	anonym29 31 minutes ago \| parent [-]
		Tell me where I can select between ROCm and Vulkan. It's not "that selector was moved", it's "that selector is no longer there". Ditto the entire interface for managing the versions of these runtimes. Those are missing features. New features having been added doesn't mean critical features aren't now missing.