| ▲ | all2 5 days ago | |||||||||||||||||||||||||
I'd be glad to hear more. I'm not certain what I would even ask, as the space is really fuzzy (prompting and all that). I've got an Ollama instance (24GB VRAM) I want to leverage to try and reduce dependency on Claude Code. Even the tech stack seems unapproachable. I've considered LiteLLM, router agents, micro-agents (smallest slice of functionality possible), etc. I haven't wrapped my head around it all the way, though. Ideally, it would be something like:
Where the UI is probably aider or something similar. Claude Code muddies the differentiation between UI and agent (with all the built in system-prompt injection). I imagine I would like to move system-prompt injection / agent CRUD into the agent shim.I'm just spitballing here. Thoughts? (my email is in my profile if you would prefer to continue there) | ||||||||||||||||||||||||||
| ▲ | CuriouslyC 5 days ago | parent [-] | |||||||||||||||||||||||||
I also have a 24gb card. Local LLMs are great for a lot of things but I wouldn't route coding questions to them, the time/$ tradeoff isn't worth it. Also, don't use LiteLLM, it's just bad, Bifrost is the way. You can use a LLM router to direct questions to an optimal model on a price/performance pareto frontier. I have a plugin for Bifrost that does this, Heimdall (https://github.com/sibyllinesoft/heimdall), it's very beta right now but the test coverage is good, I just haven't paved the integration pathway yet. I've got a number of products in the works to manage context automatically, enrich/tune rag, provide enhanced code search. Most of them are public and you can poke around and see what I'm doing. I plan on doing a number of launches soon but I like to build rock solid software and rapid agentic development really creates a large manual qa/acceptance eval burden. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||