| ▲ | shad42 6 hours ago | |
Nice, it's on our todo list to use oss models too. What are you building? | ||
| ▲ | 2001zhaozhao 4 hours ago | parent [-] | |
The basic idea is to run a 24/7 SWE agent loop on local hardware to maximize the cost effectiveness. The agents just keep running and refining development tasks in a project board. When the human is in the loop they do just enough to complete the tasks with semi-frequent human review. However, whenever human is absent or human feedback becomes the bottleneck they start autonomously debating, delegating to cloud LLMs, etc. to try to clear bottlenecks autonomously instead. So essentially the system will try to do useful things to fully utilize the hardware, which is a specific optimization for local models (you'd never need to do this if you use cloud models). The local models would also be queryable on-demand (which overrules the 24/7 tasks in terms of priority) as cheap inference. The idea is that in user-queried interactive tasks, the main Claude agent primarily only gets summaries from other agents and makes decisions based on it, thus saving a ton of tokens compared to giving it access to the codebase. These small-model calls would preferentially route to my local model to save costs but overflow to a cloud provider if demand is momentarily too high. | ||