▲ | evilduck 4 days ago | |||||||
Yeah you can, so long as you're hosting your local LLM through something with an OpenAI-compatible API (which is a given for almost all local servers at this point, including LM Studio). https://opencode.ai and https://github.com/QwenLM/qwen-code both allow you to configure any API as the LLM provider. That said, running agentic workloads on local LLMs will be a short and losing battle against context size if you don't have hardware specifically bought for this purpose. You can get it running and it will work for several autonomous actions but not nearly as long as a hosted frontier model will work. | ||||||||
▲ | TrajansRow 4 days ago | parent [-] | |||||||
Unfortunately, IDE integration like this tends to be very prefill intensive (more math than memory). That puts Apple Silicon at a disadvantage without the feature that we’re talking about. Presumably the upcoming M5 will also have dedicated matmul acceleration in the GPU. This could potentially change everything in favor of local AI, particularly on mobile devices like laptops. | ||||||||
|