| ▲ | Aurornis 2 days ago | |
None of the open weights models you can run locally will perform at the same level as the hosted frontier models. Some of them are becoming better, but the step-down in output quality is very noticeable for me. > If Github sold a $5000 box I could plug into a corner in my house and just use that entire experience locally I'd seriously consider it. I'm guessing maybe I could get partway there by spending twice that on a Mac Pro but I have no idea what the software stack would look like today. Right now, the only reasons to host LLMs locally are if you want to do it as a hobby or you are sensitive about data leaving your local network. If you only want a substitute for Copilot when GitHub is down, any of the hosted LLMs will work right away with no up front investment and lower overall cost. Most IDEs and text editors have built-in support for connecting to other hosted models or installing plugins for it. > I know at least some people are managing local fleets of agents in some manner, If your goal is to run fleets of agents in parallel, local LLM hosting is going to be a bottleneck. Familiarize yourself with some of the different tool options out their (Claude Code, Cline, even the new Mistral Vibe) and sign up for their cloud API. You can also check OpenRouter for some more options. The cloud hosted LLMs will absorb parallel requests without problem. | ||
| ▲ | llbbdd 2 days ago | parent [-] | |
Thank you, a bit sad to hear that local inference isn't really at this level of performance yet. I was previously using the VSCode agent chat and playing with both OpenAI and Github hosted models but I switched to using the Github web UI directly a lot since my workflow became a lot more issue/PR-focused. Sounds like I should probably tighten up the more generic IDE-centric workflow and make it a keyboard shortcut to switch around when a given provider is down. I haven't actually used Claude directly yet but I think Github agents often use it under the hood anyway. | ||