Remix.run Logo
DeathArrow 7 hours ago

I see lots of techniques proposed to give LLM the capacity to recall things, I even saw a lot of memory plugins for AI coding agents, I tried some myself.

What I want to see is something that was tested and proved in practice to be genuinely useful, especially for coding agents.

cjonas 2 hours ago | parent | next [-]

Coding agents don't really need memory. Agent skills, rules, git history, documentation is all far more efficient, transparent and easier to manage. These memory frameworks only really makes sense if you are building a consumer facing agent with managed context and limited capabilities.

wren6991 an hour ago | parent [-]

There's an antipattern where everyone wants to invent new interfaces to connect things LLMs when CLI tools are already right there, transparent, and usable by humans as well as LLMs. I think it's partly the origins in web chat applications.

Beads kind of does "LLM memory over CLI", or there is https://github.com/wedow/ticket which is a minimal and sane implementation of the same idea.

stephantul 7 hours ago | parent | prev [-]

How would you conceptualize recall in this case? Is searching through the current version of your code and possibly git history not enough?

rush86999 6 hours ago | parent [-]

You would think git history should be the first thing an agent would look at, as they make so many mistakes before they get to the correct answer. They don't.

I haven't measured, but documenting bug fixes and architecture seems to help, along with TDD patterns, including integration tests.

I would probably add it to Claude.md to look for all of the above when tackling a new bug.

visarga 4 hours ago | parent | next [-]

I made a harness that preserves memory for both user messages and task execution. One reason this works is related to judge agents - they can't review information that was not written down. So I track everything in my harness. The judge agents bring the most benefit, based on my evals. The coding agent can execute a task without all the ceremony just as well, but judging needs something to grasp on, besides code. And adding new perspectives helps a lot, it is the most useful intervention. My flow is - user emits a task, the agent plans, then judge agents review the plan, then main agent executes, then judge again reviews the execution. Might consume more tokens to track execution and judgements, but worth it.

brookst 4 hours ago | parent | prev [-]

My Claude code frequently looks through git history, both when planning and debugging.