Yeah you can, so long as you're hosting your local LLM through something with an OpenAI-compatible API (which is a given for almost all local servers at this point, including LM Studio).

https://opencode.ai and https://github.com/QwenLM/qwen-code both allow you to configure any API as the LLM provider.

That said, running agentic workloads on local LLMs will be a short and losing battle against context size if you don't have hardware specifically bought for this purpose. You can get it running and it will work for several autonomous actions but not nearly as long as a hosted frontier model will work.

▲

TrajansRow 4 days ago | parent [-]

Unfortunately, IDE integration like this tends to be very prefill intensive (more math than memory). That puts Apple Silicon at a disadvantage without the feature that we’re talking about. Presumably the upcoming M5 will also have dedicated matmul acceleration in the GPU. This could potentially change everything in favor of local AI, particularly on mobile devices like laptops.

	▲	evilduck 4 days ago \| parent [-]
		Cline has a new "compact" prompt enabled for their LM Studio integration which greatly alleviates the long system prompt prefill problem, especially for Macs which suffer from low compute (though it disables MCP server usage, presumably the lost part of the prompt is what made that work well). It seems to work better for me when I tested it and Cline's supposedly adding it to the Ollama integration. I suspect that type of alternate local configuration will proliferate into the adjacent projects like Roo, Kilo, Continue, etc. Apple adding hardware to speed it up will be even better, the next time I buy a new computer.