Remix.run Logo
parpfish 17 hours ago

I’ll chime in to say that this happened to me as well.

My project would start good, but eventually end up in a state where nothing could be fixed and the agent would burn tokens going in circles to fix little bugs.

So I’d tell the agent to come up with a comprehensive refactoring plan that would allow the issues to be recast in more favorable terms.

I’d burn a ton of tokens to refactor, little bugs would get fixed, but it’d inevitably end up going in circles on something new.

danabramov 16 hours ago | parent [-]

Curious if you have thoughts on the second half of the post? That’s exactly what the author is suggesting a strategy for.

majormajor 10 hours ago | parent [-]

"Test the tests" is a big ask for many complex software projects.

Most human-driven coding + testing takes heavy advantage of being white-box testing.

For open-ended complex-systems development turning everything into black-box testing is hard. The LLMs, as noted in the post, are good at trying a lot of shit and inadvertently discovering stuff that passes incomplete tests without fully working. Or if you're in straight-up yolo mode, fucking up your test because it misunderstood the assignment, my personal favorite.

We already know it's very hard to have exhaustive coverage for unexpected input edge cases, for instance. The stuff of a million security bugs.

So as the combinatorial surface of "all possible actions that can be taken in the system in all possible orders" increases because you build more stuff into your system, so does the difficulty of relying on LLMs looping over prompts until tests go green.