Remix.run Logo
flir 3 days ago

How can the agent hide the error?

You log the interaction, you see what happened, no?

wongarsu 3 days ago | parent [-]

In coding agents that would be "the test keeps failing and I can't fix it - let's delete the test" or "I can't fix this bug, let's delete the feature"

If you measure success by unit test failures or by the presence of the bug those behaviors can obscure that the LLM wasn't able to do the intended fix. Of course a closer inspection will still reveal what happened, but using proxy measurements to track success is dangerous, especially if the LLM knows about them or if the task description implies improving that metric "a unit test is failing, fix that"

flir 3 days ago | parent [-]

Sure, but the discussion here is around "in production"? I'm trying to imagine a scenario and I'm coming up short.

sebastiennight 3 days ago | parent [-]

In GP's comment, the coding agent is deployed "in production" since you (the developer) and/or your company are paying for it to use it in your business.

flir 3 days ago | parent [-]

"Introducing this in large customer pipelines or in intensive data applications"

*shrug*

To be honest, I don't think I'm going to get an answer.