Remix.run Logo
fragmede 2 days ago

"Something" is broad and not well defined, but basically yeah. Rather than try to define it in terms of complexity of the something, I'll put it in terms of minutes. If the LLM returns a response, and that response gets fed into a system and run, and that's it, I wouldn't really call that agentic. It's got to go a few more rounds back and forth to be agentic, imo. In terms of time, I'd say the agent program has to be capable of at least 10 minutes of going from user input, then the program calling into the LLM, feeding the LLM response into a system, feeding that result back into the LLM, and feeding that into the system in a loop. Obviously there are ways to game that metric, like the terrible lines of code metric, but I think it's a decent handwave for when it feels like there's an agent working for me rather than a non-agentic system. What it's doing for those 10 minutes is important, calling "sleep 600" obviously doesn't count.

Eg for a programming LLM with an agentic agent and access to a computer, would be able to, given design-doc.md and Todo.md, implement feature X, making sure it compiles, run some basic smoke tests, write appropriate unit tests, make sure they all pass, and finally push the code and create a draft PR.

Naturally, not every call into the agent is going to take the full 10 minutes. It may need to ask questions before getting started, or stop if there's an unrecoverable error. Sometimes you'll just need to tell it "continue", but the system should be capable of a 10-minute run (hopefully longer!) given enough support.