Remix.run Logo
wongarsu 3 days ago

In coding agents that would be "the test keeps failing and I can't fix it - let's delete the test" or "I can't fix this bug, let's delete the feature"

If you measure success by unit test failures or by the presence of the bug those behaviors can obscure that the LLM wasn't able to do the intended fix. Of course a closer inspection will still reveal what happened, but using proxy measurements to track success is dangerous, especially if the LLM knows about them or if the task description implies improving that metric "a unit test is failing, fix that"

flir 3 days ago | parent [-]

Sure, but the discussion here is around "in production"? I'm trying to imagine a scenario and I'm coming up short.

sebastiennight 3 days ago | parent [-]

In GP's comment, the coding agent is deployed "in production" since you (the developer) and/or your company are paying for it to use it in your business.

flir 3 days ago | parent [-]

"Introducing this in large customer pipelines or in intensive data applications"

*shrug*

To be honest, I don't think I'm going to get an answer.