Remix.run Logo
noduerme 7 hours ago

Question for people who have spent more time than I have wrangling agents to manage other agents:

I've been using a Claude Pro plan just as a code analyzer / autocomplete for a year or so. But I recently decided to try to rewrite a very large older code base I own, and set up an AI management system for it.

I started this last week, after reading about paperclip.ing. But my strategy was to layer the system in a way I felt comfortable with. So I set up something that now feels a bit like a rube goldberg machine. What I did was, set up a clean box and give my Claude Pro plan root access to it. Then set up openclaw on that box, but not with root... so just in case it ran wild, I could intervene. Then have openclaw set up paperclip.ing.

The openclaw is on a separate Claude API account and is already costing what seems like way too many tokens, but it does have a lot of memory now of the project, and in fairness, for the $150 I've spent, it has rewritten an enormous chunk of the code in a satisfactory way (with a lot of oversight). I do like being able to whatsapp with it - that's a huge bonus.

But I feel like maybe this a pretty wasteful way of doing things. I've heard maybe I could just run openclaw through my Claude Pro plan, without paying for API usage. But I've heard that Anthropic might be shutting down that OAuth pathway. I've also heard people saying openclaw just thoroughly sucks, although I've been pretty impressed with its results.

The general strategy I'm taking on this is to have Claude read the old codebase side by side with me in VSCode, then prepare documents for openclaw to act on as editor, then re-evaluate; then have openclaw produce documents for agent roles in Paperclip and evaluate them.

Am I just wasting my money on all these API calls? $150 so far doesn't seem bad for the amount of refactoring I've gotten, across a database and back and front end at the same time, which I'm pretty sure Claude Pro would not have been able to handle without much more file-by-file supervision. I'm slightly afraid now to abandon the memory I've built up with openclaw and switch to a different tool. But hey, maybe I should just be doing this all on the Claude Pro CLI at this point...?

Looking for some advice before I try to switch this project to a different paradigm. But I'm still testing this as a structure, and trying to figure out the costs.

[Edit: I see so many people talking about these lighter-weight frameworks meant for driving an agent through a large, long-running code building task... like superpowers, GSD, etc... which to me as a solo coder sound very appealing if I were building a new project. But for taking 500k LOC and a complicated database and refactoring the whole thing into a headless version that can be run by agents, which is what I'm doing now, I'm not sure those are the right tools; but at the same time, I never heard anyone say openclaw was a great coding assistant -- all I hear about it being used for is, like, spamming Twitter or reading your email or ordering lunch for you. But I've only used it as a code-manager, not for any daily tasks, and I'm pretty impressed with its usefulness at that...]

wyre 6 hours ago | parent [-]

Ya, openclaw is overkill for rewriting a codebase, especially when you're paying API costs.

I developed my own task tracker (github.com/kfcafe/beans), i'm not sure how portable it is; it's been a while since i've used it in claude code. I've been using pi-coding-agent the past few months, highly recommend, it's what's openclaw is built on top of. Anthropic hasn't shut down Oauth, they just say that it's banned outside of Claude Code. I'd recommend installing pi, tell it what you were doing with openclaw and have it port all of the information over to the installation of pi.

you could also check out ralph wiggum loops, could be a good way to rewrite the codebase. just write a prompt describing what you want done, and write a bash loop calling claude's cli pointed at the prompt file. the agent should run on a loop until until you decide to stop it. also not the most efficient usage of tokens, but at least you will be using Claude Pro and not spending money on API calls.

noduerme 6 hours ago | parent [-]

I'm kinda doing this in a back-and-forth way over each section with openclaw, and one nice thing is that I've got it including the chat log for changes with each commit. I'm happy about how it's handled my personality as needing to understand all the changes it's making before committing. So I kind of want something interactive like that -- this isn't a codebase I can trust an LLM to just fire and forget (as evidenced by some massive misunderstandings about rewiring message strings and parameter names like "_meta" and ".meta" and "_META" that meant completely different things which the LLM accidentally crossed and merged at some point, before I caught it and forced it to untangle the whole mess -- which it only did well because there were good logs).

I sort of do need something with persistent memory and personality... or a way to persist it without spending a lot of time trying to bring it back up to speed... it's not exactly specific tasks being tracked, I need it to have a fairly good grasp on the entire ecosystem.

wyre 5 hours ago | parent [-]

how big is the codebase? how often is the agent writing to memory? you might be able to get away with just appending it to the project's CLAUDE.md? you might also want to check out https://github.com/probelabs/probe

noduerme 5 hours ago | parent [-]

Hm. That looks a lot more granular, which is interesting... I'm not sure it would help me on this.

The codebase is small enough that I can basically go and find all the changes the LLM executed with each request, and read them with a very skeptical eye to verify that they look sane, and ask it why it did something or whether it made a mistake if anything smells wrong. That said, the code I'm rewriting is a genetic algorithm / evaluation engine I wrote years ago, which itself writes code that it then evaluates; so the challenge is having the LLM make changes to the control structure, with the aim of having an agent be able to run the system at high speed and read the result stream through a headless API, without breaking either the writing or evaluation of the code that the codebase itself is writing and running. Openclaw has a surprisingly good handle on this now, after a very very very long running session, but most of the problems I'm hitting still have to do with it not understanding that modifying certain parameters or names could cause downstream effects in the output (eval code) or input (load files) of the system as it's evolving.