Remix.run Logo
manojlds 9 hours ago

Which is your own harness and your own evals for your tasks I guess

munk-a 6 hours ago | parent | next [-]

I don't demand a customized compiler for my code even if such a compiler could outperform gcc. There is a lot of value in focusing on correctness to an extreme degree even if the outcome might be suboptimal to something more tailored - a tool with a large customer base can justify more resources going into its maintenance.

sanderjd 8 hours ago | parent | prev [-]

Maybe. But that sounds like a large amount of bespoke work for what seems like a common problem?

manojlds 7 hours ago | parent [-]

I was talking about enterprise agents and then realized the question is more about coding agents.

sanderjd 7 hours ago | parent [-]

Ah I see! Yes, I was talking about a coding harness, not an enterprise agent. I entirely agree with you that your suggestion of driving it via evals is the right thing for that use case!