Remix.run Logo
zmmmmm 18 hours ago

It's good to see experts with similar scepticism about agents that I have. I don't doubt they will be useful in some settings, but they lean into all the current weak points of large language models and make them worse. Security, reproducibility, hallucinations, bias, etc etc.

With all these issues already being hard to manage, I just don't believe businesses are going to delegate processes to autonomous agents in a widespread manner. Literally anything that matters is going to get implemented in a crontrolled workflow that strips out all the autonomy with human checkpoint at every step. They may call them agents just to sound cool but it will be completely controlled.

Software people are all fooled by what is really a special case around software development : outcomes are highly verifiable and mistakes (in development) are almost free. This is just not the case out there in the real world.

theptip 14 hours ago | parent | next [-]

Fully autonomous agents are marketing fluff right now, but there is like $10T of TAM from promoting most knowledge workers to a manager and automating the boring 80% of their work, and this doesn’t require this full autonomy.

Karpathy’s definition of “agent” here is really AGI (probably somewhere between expert and virtuoso AGI https://arxiv.org/html/2311.02462v2). In my taxonomy you can have non-AGI short-task-timeframe agents. Eg in the METR evals, I think it’s meaningful to talk about agent tasks if you set the thing loose for 4-8h human-time tasks.

mexicocitinluez 17 hours ago | parent | prev [-]

> Literally anything that matters is going to get implemented in a crontrolled workflow that strips out all the autonomy with human checkpoint at every step.

Yea, there aren't a ton of problems (that I can see) in my current domain that could be solved by having unattended agents generating something.

I work in healthcare and there are a billion use cases right now, but none that don't require strict supervision. For instance, having an LLM processing history and physicals from potential referrals looking for patient problems/extracting historical information is cool, but it's nowhere near reliable enough to do anything but present that info back to the clinician to have them verify it.