Remix.run Logo
ilaksh 2 days ago

I think this was true late 2023 or early 2024, but not necessarily in mid 2025 for most tasks (as long as they require some AI and aren't purely automation) and you use SOTA LLMs.

I used to build the way most of his examples are just functions calling LLMs. I found it almost necessary due to poor tool selection etc. But I think the leading edge LLMs like Gemini 2.5 Pro and Claude 4 are smart enough and good enough at instruction following and tool selection that it's not necessarily better to create workflows.

I do have a checklist tool and delegate command and may break tasks down into separate agents though. But the advantage of creating instructions and assigning tool commands, especially if you have an environment with a UI where it is easy to assign tool commands to agents and otherwise define them, is that it is more flexible and a level of abstraction above something like a workflow. Even for visual workflows it's still programming which is more brittle and more difficult to dial in.

This was not the case 6-12 months ago and doesn't apply if you insist on using inferior language models (which most of them are). It's really only a handful that are really good at instruction following and tool use. But I think it's worth it to use those and go with agents for most use cases.

The next thing that will happen over the following year or two is going to be a massive trend of browser and computer use agents being deployed. That is again another level of abstraction. They might even incorporate really good memory systems and surely will have demonstration or observation modes that can extract procedures from humans using UIs. They will also learn (record) procedural details for optimization during exploration from verbal or written instructions.

clbrmbr a day ago | parent | next [-]

I agree that the strongest agentic models (Claude Opus 4 in particular) change the calculus. They still need good context, but damn are they good at reaching for the right tool.

bonzini 2 days ago | parent | prev [-]

The techniques he has in the post are mostly "model your problem as a data flow graph and follow it".

If you skip the modeling part and rely on something that you don't control being good enough, that's faith not engineering.

ilaksh 2 days ago | parent [-]

I didn't say to skip any kind of problem modeling. I just didn't emphasize it.

The goal _should_ be to avoid doing traditional software engineering or create a system that requires typical engineering to maintain.

Agents with leading edge LLMs allow smart users to have flexible systems that they can evolve by modifying instructions and tools. This requires less technical skill than visual programming.

If you are only taking advantage of the LLM to handle a few wrinkles or a little bit of natural language mapping then you aren't really taking advantage of what they can do.

Of course you can build systems with rigid workflows and sprinkling of LLM integration, but for most use cases it's probably not the right default mindset for mid-2025.

Like I said, I was originally following that approach a little ways back. But things change. Your viewpoint is about a year out of date.

bonzini 2 days ago | parent [-]

I understand that. You didn't answer the important point, which is that you can't be sure that what you have works if you don't encode the process. And encoding the processes isn't really software engineering; abstractions for business rules management have existed for decades and can be reused in this context.

You're YOLOing it, and okay that may be fine but may also be a colossal mistake, especially if you remove or never had a human in the loop.

ilaksh 2 days ago | parent [-]

What I suggested was to use an actual agent. I also did not say there was no human in the loop.

The process is encoded in natural language and tool options.

I'm not YOLOing anything.

bonzini a day ago | parent [-]

If there is a human in the loop, TFA does say that agents can be the solution. In fact that's pretty much the conclusion that the author makes.

By saying "you should just use agents", anyone who has read the article will assume that you're talking about the case where there's no human in the loop.