| ▲ | Claude Code: connect to a local model when your quota runs out(boxc.net) | |||||||||||||||||||||||||||||||||||||
| 76 points by fugu2 3 days ago | 20 comments | ||||||||||||||||||||||||||||||||||||||
| ▲ | alexhans an hour ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
Useful tip. From a strategic standpoint of privacy, cost and control, I immediately went for local models, because that allowed to baseline tradeoffs and it also made it easier to understand where vendor lock-in could happen, or not get too narrow in perspective (e.g. llama.cpp/open router depending on local/cloud [1] ). With the explosion of popularity of CLI tools (claude/continue/codex/kiro/etc) it still makes sense to be able to do the same, even if you can use several strategies to subsidize your cloud costs (being aware of the lack of privacy tradeoffs). I would absolutely pitch that and evals as one small practice that will have compounding value for any "automation" you want to design in the future, because at some point you'll care about cost, risks, accuracy and regressions. [1] - https://alexhans.github.io/posts/aider-with-open-router.html | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | wkirby 27 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
My experience thus far is that the local models are a) pretty slow and b) prone to making broken tool calls. Because of (a) the iteration loop slows down enough to where I wander off to do other tasks, meaning that (b) is way more problematic because I don't see it for who knows how long. This is, however, a major improvement from ~6 months ago when even a single token `hi` from an agentic CLI could take >3 minutes to generate a response. I suspect the parallel processing of LMStudio 0.4.x and some better tuning of the initial context payload is responsible. 6 months from now, who knows? | ||||||||||||||||||||||||||||||||||||||
| ▲ | hkpatel3 an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Openrouter can also be used with claude code. https://openrouter.ai/docs/guides/claude-code-integration | ||||||||||||||||||||||||||||||||||||||
| ▲ | btbuildem 21 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I'm confused, wasn't this already available via env vars? ANTHROPIC_BASE_URL and so on, and yes you may have to write a thin proxy to wrap the calls to fit whatever backend you're using. I've been running CC with Qwen3-Coder-30B (FP8) and I find it just as fast, but not nearly as clever. | ||||||||||||||||||||||||||||||||||||||
| ▲ | baalimago 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Or better yet: Connect to some trendy AI (or web3) company's chatbot. It almost always outputs good coding tips | ||||||||||||||||||||||||||||||||||||||
| ▲ | eek2121 19 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I gotta say, the local models are catching up quick. Claude is definitely still ahead, but things are moving right along. | ||||||||||||||||||||||||||||||||||||||
| ▲ | zingar an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I guess I should be able to use this config to point Claude at the GitHub copilot licensed models (including anthropic models). That’s pretty great. About 2/3 of the way through every day I’m forced to switch from Claude (pro license) to amp free and the different ergonomics are quite jarring. Open source folks get copilot tokens for free so that’s another pro license I don’t have to worry about. | ||||||||||||||||||||||||||||||||||||||
| ▲ | mcbuilder 15 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Opencode has been a thing for a while now | ||||||||||||||||||||||||||||||||||||||
| ▲ | raw_anon_1111 36 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Or just don’t use Claude Code and use Codex CLI. I have yet to hit a quota with Codex working all day. I hit the Claude limits within an hour or less. This is with my regular $20/month ChatGpT subscription and my $200 a year (company reimbursed) Claude subscription. | ||||||||||||||||||||||||||||||||||||||
| ▲ | esafak 34 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Or they could just let people use their own harnesses again... | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | swyx an hour ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
i mean the other obvious answer is to plug in to the other claude code proxies that other model companies have made for you: https://docs.z.ai/devpack/tool/claude https://www.cerebras.ai/blog/introducing-cerebras-code or i guess one of the hosted gpu providers if you're basically a homelabber and wanted an excuse to run quantized models on your own device go for it but dont lie and mutter under your own tin foil hat that its a realistic replacement | ||||||||||||||||||||||||||||||||||||||