Are you assuming a brute force "burn tokens until it passes the tests" model, or is there a really sweet approach on the horizon that is impractical at current token costs?

I'm asking 'cos while I'm philosophically opposed to the first option, but I'd love to hear about anything that resembles the second.

▲

a day ago | parent | next [-]

[deleted]

▲

SpicyLemonZest a day ago | parent | prev [-]

One idea I've heard is prototype-first design reviews. If the cost of code genuinely trends to zero, there's no reason why most technical disagreements about product functionality couldn't come with prototypes to illustrate each side of the debate. Today, that's not always practical between token costs and usage limits.

▲

pydry 21 hours ago | parent [-]

What if the agent fucks up the better approach but does a good job of the worst approach?

	▲	SpicyLemonZest 20 hours ago \| parent [-]
		Then hopefully the reviewers will notice that the first prototype's flaws are correctable. Sometimes they won't, and they'll end up making a bad decision, just as they sometimes make bad decisions today with no prototypes to look at. But having prototypes allows for a lot of debates that are today vague and meandering to be reduced to "which of these assertions at the end of this integration test do you think is the correct behavior?".