Remix.run Logo
llbbdd 2 days ago

I've gotten accustomed lately to spending a lot of time in the Github Copilot / agent management page. In particular I've been having a lot of fun using agents to browse some of my decade-old throwaway projects; telling it to "setup playwright, write some tests, record screenshots/videos and commit them to the repo" works every time and it's a great way to browse memory lane without spending my own time getting some of these projects building and running again.

However this means I'm now using the Github website and services 1000x more than I was previously, and they're trending towards having coin-flip uptime stats.

If Github sold a $5000 box I could plug into a corner in my house and just use that entire experience locally I'd seriously consider it. I'm guessing maybe I could get partway there by spending twice that on a Mac Pro but I have no idea what the software stack would look like today.

Is there a fully local equivalent out-of-the-box experience that anyone can vouch for? I've used local agents primarily through VSCode, but AFAIK that's limited to running a single active agent over your repo, and obviously limited by the constraints of running on a single M1 laptop I currently use. I know at least some people are managing local fleets of agents in some manner, but I really like how immensely easy Github has made it.

Aurornis 2 days ago | parent | next [-]

None of the open weights models you can run locally will perform at the same level as the hosted frontier models. Some of them are becoming better, but the step-down in output quality is very noticeable for me.

> If Github sold a $5000 box I could plug into a corner in my house and just use that entire experience locally I'd seriously consider it. I'm guessing maybe I could get partway there by spending twice that on a Mac Pro but I have no idea what the software stack would look like today.

Right now, the only reasons to host LLMs locally are if you want to do it as a hobby or you are sensitive about data leaving your local network. If you only want a substitute for Copilot when GitHub is down, any of the hosted LLMs will work right away with no up front investment and lower overall cost. Most IDEs and text editors have built-in support for connecting to other hosted models or installing plugins for it.

> I know at least some people are managing local fleets of agents in some manner,

If your goal is to run fleets of agents in parallel, local LLM hosting is going to be a bottleneck. Familiarize yourself with some of the different tool options out their (Claude Code, Cline, even the new Mistral Vibe) and sign up for their cloud API. You can also check OpenRouter for some more options. The cloud hosted LLMs will absorb parallel requests without problem.

llbbdd 2 days ago | parent [-]

Thank you, a bit sad to hear that local inference isn't really at this level of performance yet. I was previously using the VSCode agent chat and playing with both OpenAI and Github hosted models but I switched to using the Github web UI directly a lot since my workflow became a lot more issue/PR-focused. Sounds like I should probably tighten up the more generic IDE-centric workflow and make it a keyboard shortcut to switch around when a given provider is down. I haven't actually used Claude directly yet but I think Github agents often use it under the hood anyway.

bastardoperator 2 days ago | parent | prev | next [-]

They do, it's called GHES.

https://docs.github.com/en/enterprise-server@3.19/admin/over...

"GitHub Enterprise Server is a self-hosted version of the GitHub platform"

AceJohnny2 2 days ago | parent | next [-]

you're not getting copilot on the self-hosted version, which is what the parent was focusing on.

bastardoperator 21 hours ago | parent [-]

Incorrect, you can use GitHub connect to sync licenses, this allows you to license users under GHEC and GHES at the cost of a single seat. You will need an entitlement for Copilot, but the fact is you can absolutely get access while storing none of your code on .com.

verst 2 days ago | parent | prev | next [-]

That does not include the Copilot related APIs though.

ModernMech 2 days ago | parent | prev [-]

I've tried getting this set up at my University, it was hell dealing with them. We ended up going with Gitlab.

colechristensen 2 days ago | parent | prev [-]

An NVIDIA DGX Spark is $4000, pair that with a relatively cheap second box to run GitLab in the corner and you would have pretty good local AI inference setup. (you'd probably have to write a nontrivial amount of software to get your setup where you want)

The local models are just right on the edge of being really useful, there's a tipping point to where accuracy is high enough so that getting things done is easy vs models getting continuously stuck. We're in the neighborhood.

Alternatively, just have local GitLab and use one of the many APIs, those are much more stable than github. Honestly just get yourself a Claude subscription.

smcleod 2 days ago | parent | next [-]

The DGX Spark is not good for inference though it's very bandwidth limited - around the same as a lower end MacBook Pro. You're much better off with a Apple silicon for performance and memory size at the moment but I'd recommend holding off until the M5 Max comes out early in the early as the M5 has vastly superior performance to any other Apple silicon chip thanks to its matmul instruction set.

llbbdd 2 days ago | parent [-]

Oof, I was already considering an upgrade from the M1 but was hoping I couldn't be convinced to go for the top of the line. Is the performance jump from the M# -> M# Max chips that substantial?

smcleod a day ago | parent | next [-]

The main jump is from anything to M5; not because it's simply the latest but because it has matmul instructions similar to a CUDA GPU which fixes the slow prompt processing on all previous generation Apple Silicon chips.

baby_souffle a day ago | parent | prev [-]

> Is the performance jump from the M# -> M# Max chips that substantial

From m1? Yes, absolutely. M3 is marginal now but m5 will probably make it definite.

llbbdd 2 days ago | parent | prev [-]

I can't say I'm not tempted looking at the Spark, I could probably save some cash on heating my house with that thing. Though yeah unless there's some good software already built around a similar LLM workflow I could use it'd probably be wasted on me, or spend its time desperately trying to pay for itself with crypto mining.

Adding Claude to my rotation is starting to look like the option with the least amount of building the universe from scratch. I have to imagine it can be used in a similar or identical workflow to the Copilot one where it can create PRs and make adjustments in response to feedback etc.

colechristensen 2 days ago | parent [-]

>Though yeah unless there's some good software already built around a similar LLM workflow I could use it'd probably be wasted on me, or spend its time desperately trying to pay for itself with crypto mining.

A big part of my success using LLMs to build software is building the tools to use LLMs and the LLMs making that tool building easy (and possible).

llbbdd 2 days ago | parent [-]

I tried this for a little while and couldn't really get passionate about it; I have too many other backlogged projects that I was eager to tear into with LLMs and I got impatient. That was a while ago though and the ROI for building my own tools has probably gotten a lot more attractive.

colechristensen 2 days ago | parent [-]

I started building my own tool set because I was doing too many projects with LLMs and getting frustrated by a very real need for organization and tooling to get repetitive meaningless tasks out of the way and to get all of my projects organized so I could see what was going on.

llbbdd 2 days ago | parent [-]

I'm convinced. :) I've got some time to kill in transit later today, maybe time to think about my setup a bit.