I'd be glad to hear more. I'm not certain what I would even ask, as the space is really fuzzy (prompting and all that).

I've got an Ollama instance (24GB VRAM) I want to leverage to try and reduce dependency on Claude Code. Even the tech stack seems unapproachable. I've considered LiteLLM, router agents, micro-agents (smallest slice of functionality possible), etc. I haven't wrapped my head around it all the way, though.

Ideally, it would be something like:

    UI <--> LiteLLM
               ^
               |
               v
            Agent Shim

Where the UI is probably aider or something similar. Claude Code muddies the differentiation between UI and agent (with all the built in system-prompt injection). I imagine I would like to move system-prompt injection / agent CRUD into the agent shim.

I'm just spitballing here.

Thoughts? (my email is in my profile if you would prefer to continue there)

▲ CuriouslyC 5 days ago | parent [-]

I also have a 24gb card. Local LLMs are great for a lot of things but I wouldn't route coding questions to them, the time/$ tradeoff isn't worth it. Also, don't use LiteLLM, it's just bad, Bifrost is the way.

You can use a LLM router to direct questions to an optimal model on a price/performance pareto frontier. I have a plugin for Bifrost that does this, Heimdall (https://github.com/sibyllinesoft/heimdall), it's very beta right now but the test coverage is good, I just haven't paved the integration pathway yet.

I've got a number of products in the works to manage context automatically, enrich/tune rag, provide enhanced code search. Most of them are public and you can poke around and see what I'm doing. I plan on doing a number of launches soon but I like to build rock solid software and rapid agentic development really creates a large manual qa/acceptance eval burden.

▲

all2 5 days ago | parent [-]

So there is no place for a local llm in code dev. Bummer. I was hoping to get past the 5 hour limits on claude code with local models.

▲

CuriouslyC 5 days ago | parent [-]

Your best bet is the new Deepseek, it's claude code compatible, just use the anthropic url, they have instructions online.

	▲	all2 4 days ago \| parent [-]
		For the curious, here are the relevant docs: https://api-docs.deepseek.com/guides/anthropic_api