| ▲ | kaydub 2 hours ago | |||||||||||||||||||||||||||||||||||||||||||||||||
I feel like it was doing what you're saying about 4-6 months ago. Especially the commenting out tests. Not always but I'd have to do more things step by step and keep the llm on track. Now though, the last 3-4 months, it's writing decent unit tests without much hand holding or refactors. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ▲ | joshstrange 2 hours ago | parent [-] | |||||||||||||||||||||||||||||||||||||||||||||||||
Hmm, my last experience was within the last 2 months but I'm trying not to write it off as "this sucked and will always suck", that's the #1 reason I keep testing and playing with these things, the capabilities are increasing quickly and what did/didn't work last week (especially "last model") might work this week. I'll keep testing it but that just hasn't been my experience, I sincerely hope that changes because an agent that runs unit test [0] and can write them would be very powerful. [0] This is a pain point for me. The number of times I've watching Claude run "git commit --no-verify"... I've told it in CLAUDE.md to never bypass commit checks, I've told it in the prompt, I've added it 10 more times in different places in CLAUDE.md but still, the agent will always reach for that if it can't fix something in 1-3 iterations. And yes, I've told it "If you can't get the checks to pass then ask me before bypassing the checks". It doesn't matter how many guardrails I put up and how good they are if the agent will lazily bypass them at the drop of a hat. I'm not sure how other people are dealing with this (maybe with agents managing agents and checking their work? A la Gas Town?). | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||