I also have a 24gb card. Local LLMs are great for a lot of things but I wouldn't route coding questions to them, the time/$ tradeoff isn't worth it. Also, don't use LiteLLM, it's just bad, Bifrost is the way.

You can use a LLM router to direct questions to an optimal model on a price/performance pareto frontier. I have a plugin for Bifrost that does this, Heimdall (https://github.com/sibyllinesoft/heimdall), it's very beta right now but the test coverage is good, I just haven't paved the integration pathway yet.

I've got a number of products in the works to manage context automatically, enrich/tune rag, provide enhanced code search. Most of them are public and you can poke around and see what I'm doing. I plan on doing a number of launches soon but I like to build rock solid software and rapid agentic development really creates a large manual qa/acceptance eval burden.

▲

all2 5 days ago | parent [-]

So there is no place for a local llm in code dev. Bummer. I was hoping to get past the 5 hour limits on claude code with local models.

▲

CuriouslyC 5 days ago | parent [-]

Your best bet is the new Deepseek, it's claude code compatible, just use the anthropic url, they have instructions online.

	▲	all2 4 days ago \| parent [-]
		For the curious, here are the relevant docs: https://api-docs.deepseek.com/guides/anthropic_api