Remix.run Logo
gwerbin 4 hours ago

I think it depends a lot on how automated the agent is and how long you let it run for. Full automation where you try to build an entire piece of software with agents... yeah, no, we are not there yet. At least not a few you care about maintainability.

Short-lived tightly-scoped agents can do alarmingly thorough and high-quality knowledge work, as long as the work itself is relatively mechanical and can either be carried out in independent chunks or sequentially. For example, a research agent like the Gemini "deep research" tool can save hours of digging around the web and compiling information. With careful prompting, sufficient background context, and good self-evaluation tools, an agentic loop can do very detailed data analysis, carry out serious statistics and machine learning projects, produce high-quality data visualization thereof, and put together a handy executive summary.

They occasionally hallucinate, go off track, get confused, and make mistakes. But they "know" everything that's been published in English for the last 200 years, they never get tired, and the code they write is good enough for throwaway scripting. The real power of agents being able to write code is that they can be extremely self-sufficient and flexible in carrying out these kinds of tree- and sequence-structured knowledge work tasks.

That's of course a different thing from "designing good software", which is neither tree-structured or sequential, and requires a level of intelligence (for lack of a better term) that LLMs do not seem to be capable of, at least not yet. But that's a more specific thing than just writing code in order to get stuff done that happens to require code.