| ▲ | alexhans 6 hours ago | |
That's a decent practice from the lens of reducing blast radius. It becomes harder when you start thinking about unattended systems that don't have you in the loop. One problem I'm finding discussion about automation or semi-automation in this space is that there's many different use cases for many different people: a software developer deploying an agent in production vs an economist using Claude Vs a scientist throwing a swarm to deal with common ML exploratory tasks. Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop. AI Evals, sandboxing, observability seem like 3 key pillars to maintain intent in automation but how to help these different audiences be safely productive while fast and speak the same language when they need to product build together is what is mostly occupying my thoughts (and practical tests). | ||
| ▲ | daveguy 5 hours ago | parent [-] | |
Current LLMs are nowhere near qualified to be autonomous without a human in the loop. They just aren't rigorous enough. Especially the "scientist throwing a swarm to deal with common ML exploratory tasks." The judgement of most steps in the exploratory task require human feedback based on the domain of study. > Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop. Completely agreed. This is because LLMs are atrocious at judgement and guiding the sequence of exploration is critically dependent on judgement. | ||