| ▲ | dreadnip 2 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The problem I have with this workflow is that the models are still too eager to please. If I ask it to scan a release and note possible issues, it absolutely will find issues. If I keep running the same prompt, it will keep finding issues. I’ve spammed GitHub PR reviews and it just keep finding (or inventing?) new issues. There is never a “Nothing found, good to go!”. I have to keep reminding myself that the model will always give me what I ask for, regardless of the reality/truth. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | mejutoco 5 minutes ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
You could ask the model to say "nothing found" if the improvement was stylistic, or other constraints. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | baq 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
You didn’t do it enough. They stop finding bugs eventually. Also, different models can find different bugs (though they do find the same ones, too, which is good and expected). For best results you want to run multi model reviews in loops. If you had multiple people look at your PRs multiple times on different days results would be very similar. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | imhoguy an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
You need to create review skill and there define what "issue" or "good" are for you to limit sensitiviness. Otherwise you depend on model's random threshold or non of such then you get perfection chasing. Anyway it will never match your judgemend completely unless you upload your brain dump into model. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | higeorge13 27 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
You need to run them in review loops, this is the only way to reduce or eliminate these issues. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | embedding-shape 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> There is never a “Nothing found, good to go!”. Like when you do recursive programming, have you tried providing more/better stop conditions? If you literally just say "Continue until there are no more issues" then it'll do just that, but if you scope it better, like "Only mention issues related to X, Y or that leads to Z" and so on, you'll get less noise and more focus on issues that actually matter (to you). | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | onion2k an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
If I keep running the same prompt, it will keep finding issues. I've had the same experience, but whenever I've reviewed what it finds it's basically right. It's pedantic, and a lot of the problems aren't things I really care about, but they definitely are real problems. I'm not sure you can blame the AI for always finding problems if a) you asked it to, and b) there are problems to find. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | starquake an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I use Claude Code and one of the steps in my workflow is do a review loop until no issues are found and it never loops. So my experience is entirely different. Even if I say: fix all issues. So not only the critical issues. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | 9dev 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
There is a point of diminishing returns though; the issues suggested will get speculative, or point out comment unclarity, or "defense in depth". But I agree it’s somewhat annoying to rarely get clear pushback in terms of "no, this looks good enough to me, release it" | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Myzel394 an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
That's just plain wrong. The new models do not hallucinate as much as they used to (in my personal experience) | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | threatripper 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
You get the same result if you pay humans a good sum of money to find issues. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | Tiberium 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I think this was true with older models, but at least with GPT 5.5 it can genuinely tell you "no issues found" after a few passes of finding real issues. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | knorker 30 minutes ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
What do you mean? Are they valid flaws or not? Would you like it to stop when there's still flaws in the code? | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | gib444 41 minutes ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
It's not eagerness to please (that's anthropomorphising), rather it's a desire to bill you more money/use more tokens (The fixed prices are just temporary discounts) | |||||||||||||||||||||||||||||||||||||||||||||||||||||