Remix.run Logo
baq 2 hours ago

You didn’t do it enough. They stop finding bugs eventually. Also, different models can find different bugs (though they do find the same ones, too, which is good and expected). For best results you want to run multi model reviews in loops.

If you had multiple people look at your PRs multiple times on different days results would be very similar.

PunchyHamster an hour ago | parent | next [-]

I've had it find bug, I asked it to make test to trigger the bug, and then it figured out it's not a bug. It will absolutely do wish fulfilment

left-struck an hour ago | parent | next [-]

Yeah when these models find a bug i like to ask it to write a test that will fail if the bug is real and pass when the bug is solved.

It’s not perfect but usually it works pretty well, and I’ve had the model come back to me with oh actually the test passed, the bug doesn’t work exist

As a bonus, you’ve now got a test that can detect that bug if it comes up again.

csomar 29 minutes ago | parent | prev [-]

It'll find a non-existent bug - fix it - figure out it broke a previously working thing - try to fix again - etc..

The "keep improving" the code base prompt have been tried and it never works. The LLM has no consciousness of where to stop and where to draw the lines of reasonableness.

MallocVoidstar 2 hours ago | parent | prev [-]

No, depending on the complexity of the issue models can be into loops, where they go "this is definitely an issue and must be fixed", and then the resulting fixed code gets "this is definitely an issue and must be fixed", and then the resulting fixed code has the original 'issue'.

bfjvibybd6cuvu6 an hour ago | parent | next [-]

That's a different kind of loop.

For a normal review loops you can ask the model to return with nothing found if nothing is found and not invent things and it will do a better job of exiting without anything found.

memoriyato3 an hour ago | parent | prev [-]

yeah, happened to me: "A is very wrong, you should do B", and on the next fresh review loop "B is very wrong, you should do A"

typically this means there is some ambiguity in the specification, and the model flips between alternative interpretations

bluenose69 21 minutes ago | parent [-]

I get this sometimes when I ask the agent on GitHub to suggestion improvements to my Julia code. It's kind of fun to watch it struggle to please. I'm reminded of the old "Doctor" mode in Emacs.