Remix.run Logo
oofbey 4 days ago

I’ve also had extremely poor luck getting any LLM agent to go through a long list of repetitive tasks. Don’t know why. I’d guess it’s because they’re trained for transactional responses, and thus are horrible at repute anything.

ukuina 3 days ago | parent [-]

Very much this.

You are better off asking it a write a script to invoke itself N times across the task list.

threecheese 3 days ago | parent [-]

Same. I think there’s an untapped market (feature really) here, which if isn’t solved by GPT-next will start to reveal itself as a problem more and more.

LLMs are really bad at being comprehensive, in general, and from one inference to the next their comprehensive-ness varies wildly. Because LLMs are surprising the hell out of everyone with their abilities, less attention is paid to this; they can do a thing well, and for now that’s good enough. As we scale usage, I expect this gap will become more obvious and problematic (unless solved in the model, like everything else).

A solution I’ve been toying with is something like a reasoning step, which could probably be done with mostly classical NLP, that identifies constraints up front and guides the inference to meet them. Like a structured output but at a session level.

I am currently doing what you suggest though, I have the agent create a script which invokes … itself … until the constraints are met, but that obviously requires that I am engaged there; I think it could be done autonomously, with at least much better consistency (at the end of the day even that guiding hand is inference based and therefore subject to the same challenges).