Remix.run Logo
spacedcowboy 4 hours ago

It actually got quite snippy with me, when I was trying to get it to debug some issues. It kept on saying that the "burn rate" wasn't progressing and "we" should refocus our efforts somewhere else. Eventually I got something like "I have told you three times now that this is not the best approach to be taking to reduce the burn-rate and you have not taken that advice". And it stopped helping out.

So I was blunt, and said "I don't care about the burn-rate on some hypothetical chart that you produced at the start. I care about removing bugs and having a robust product, which this approach is satisfactorily doing. We will continue along this path, if the tests are not showing gain, then the tests are poorly designed".

At which point it got all apologetic, wrote new memories, and we didn't have a problem thereafter.

The issue was that I was attacking a huge bug-surface, and although each bug-fix was valid, correct, and helped move the dial, it didn't move the dial on the test-bed that Claude had created to measure its work against. There were too many inter-connected bugs for a single fix to really make a difference to these higher-level tests. I knew it was going to take a while to get through them, but apparently Claude didn't.

You try changing the size of a pointer from 2 bytes to 3 bytes on a compiler[1] for a 6502 while introducing automatically-tracked bank-switching on your memory-managed pointers, and see how many code-sites that impacts [grin].

[1]: https://atari-xt.com

regularfry 3 hours ago | parent | next [-]

That sounds more like a spec change than a set of bug fixes, even if the conclusion is that the potentially implicit spec you started with was incorrect. I've had an interesting experience extracting a spec from some existing code, making some modifications, then saying basically "implement this spec, don't come back until you're done".

An interesting experiment would be to try having the agent annotate the code with the relevant spec section while it's extracting the spec, then to have the agent update the spec with the new requirement - as an explicit change with something like "This section updated in V2 with...." - and have the agent update the codebase from that.

Some of these problems do just need breaking down a little further than you'd think to make the agent's life easier. This might be one of them.

Animats 2 hours ago | parent | prev | next [-]

> It kept on saying that the "burn rate" wasn't progressing and "we" should refocus our efforts somewhere else.

It sounds like a boss. How soon will it be?

HDThoreaun 3 hours ago | parent | prev [-]

Dont argue with LLMs. Sometimes they lose the plot, when that happens simply flush the context and start over.