Remix.run Logo
jillesvangurp 2 hours ago

Agents are basically tool using LLMs running in a loop where they come up with a plan, which includes running tools, the tool output is added to the context, and it iterates until it is done fulfilling some goal. It's basically exactly like a regular LLM chat except it is chatting with itself and giving itself instructions to run particular tools.

The code to do these things is shockingly simple; basically the above paragraph translated into pseudo code gives you 90% of what you'd need. Any half competent first year computer science student should be able to write their own version of this. Except of course they should be letting LLMs do the heavy lifting here.

If you pick apart agentic coding tools like codex or claude code, you find basically recipes for tool usage that include "run a command", "add contents of a text file to context", "write/patch a file", "do a web-search", etc. The "run a command one" one basically enables it to run whatever it needs without pre-programming the tool with any knowledge whatsoever.

That all comes from training and web searches. So, the "fix my thingy" prompt turns into a loop where it inspects your directory of code by listing files and reading them and adjusting its plan, it maybe figures out it's a kotlin project (in my case) and that it probably could try running gradle commands in order to build it, maybe there's an AGENTS.md file with some helpful information. Or a README.md. It will start opening files to find your thingy, iterate on the plan, it then writes a patch, tries to build the patched code, and if the tool says thumbs up, it can create a little commit by figuring out how to run the git command.

It's like magic when you see this in action. But all the magic is in the LLM; not the tool. Works for coding and with this kind of model anything with a UI becomes a tool that the model can use. UIs become APIs basically.

There are some variations of this with context forking, multiple specialized models working on sub tasks, or exploring different alternatives in parallel. But the core principle is very simple.

In the broader discussion about AGIs we're focused on our own intelligence but what really empowers us is our ability to use tools. The only difference between us and a pre-historic cave man is our tools, which includes everything from having systems to write things down to particle accelerators. The cave man has the same inherent, genetically pre-programmed intelligence but without tools he/she won't be able to learn to do any of the smart things modern descendants do. If you've ever seen a toddler use an ipad, you know how right I am. Most of them play games before they figure out how to walk.

The LLM way of writing things down is "adding them to a context". Most of the tool progress right now is about making that scale better. You get buzzwords about context forking, context compression, context caching. All that is is low level hacks to get the LLM to track more stuff. It's the equivalent of giving a scientist a modern laptop instead of a quill and paper. Same intelligence, better tools.