Remix.run Logo
SwellJoe 8 hours ago

I know everybody seems to want the agent to remember every conversation they've ever had with it, but I just don't see the value in that. In fact, it seems to hurt productivity to have the agent second guessing me based on something I said yesterday. Every time I've used any memory system, the agent gets distracted from the current tasks based on previous conversations and branches of development...often comingling unrelated projects (I work on code for work, open source projects, a bunch of unrelated side projects, etc.) and trying to satisfy requirements that don't make sense.

I've stopped trying to achieve general "memory". I just ask the agent to thoroughly, but concisely, document each project. If it writes developer documentation and a development plan/roadmap, as though a person was going to have to get up to speed and start working on the project, it provides all the information the agent needs tomorrow or next week to pick up where we left off.

The agent is not my friend. I don't need it to remember my birthday or the nasty thing I said about React last week. I need it to document what anyone, agent or human, would need to know to get productive in a particular repo, with no previous knowledge of the project.

Good, concise, developer and user documentation and a plan with checklists solves every problem people seem to think "memory" will solve: It tells the agent what tech stack to use (we hashed it out in planning), it tells it what commands it needs to run and test the app, it covers the static analysis tools in use (which formalizes code style, etc. in a way a vague comment I made a month ago cannot), and it is cheap. Markdown files are the native tongue of agents. No MCP, no skills, no API needed. Just read the file. It works for any agent, any model, and any human just getting started with the project.

Basically, I think memory makes agents dumber and less useful. I want it to focus on the task at hand.

mtrifonov 39 minutes ago | parent | next [-]

You're right but I think you're describing flat memory. The agent gets distracted because every old fact has the same weight as the current one. That's a salience problem.

What works in production for me is typed memory with very different decay curves. Personality and relationships are essentially permanent. Preferences fade in months. Stated intent fades in weeks. Emotion and events fade in days. Reinforcement (repeated recall) keeps things alive regardless of type.

Cross-project co-mingling stops because project-specific stuff actually decays out of relevance while who the user is persists. There's also a filter on what even gets written, which scopes between globally and locally-relevant information and writes accordingly (if at all). Most of the noise you're describing comes from systems that store everything they observe.

Flat memory failing is real. Memory failing in general is a stronger claim than that.

SwellJoe 13 minutes ago | parent [-]

I'm making the stronger claim. I don't think memory (at least, what people call "memory", even though it isn't...the memories LLMs have are baked in at training, everything else is context), no matter how fancy, improves outcomes, at least for the work I do on the software I work on. I just don't think the agent needs what people are calling memory.

I think the base truth is the code, which can be loaded into context at no greater cost than whatever "memory" system you're using, probably lower cost, actually. A few hints in documentation fills out the rest of the picture.

You can't realistically give an LLM memory, as current technology doesn't allow retraining the model on the fly. You can only give it more data to ingest into its context. Unless that data is directly relevant to the task at hand, it's probably detrimental. At best, it is just burning tokens for no benefit.

netcan 8 minutes ago | parent [-]

Useful comment. Thanks.

pil0u 7 hours ago | parent | prev | next [-]

I appreciate your comment, and can relate. I tested a couple of "memory" systems, doing some heavy lifting or seemingly implementation of theories (layering, hot memory, etc), I can't really tell if they improve performance, quality or reliability on a task. But they do increase the overhead, for the LLM and for me, that's for sure.

One problem I have is that now CLAUDE.md or skills tend to get version controlled within projects, I suspect they could get in the way sometimes.

There is already so much fatigue induced by these systems, adding another one willingly does sound crazy.

ohNoe5 6 hours ago | parent | prev | next [-]

Yeah it's that lack of perfect recall, imo, that gives rise to intelligence and progress.

If we humans just did exactly what we did yesterday, what progress?

It's baked into the immutable constants of the universe for us; entropy, signal attenuation over distances... information breaks down over time.

Because of this all human social statistics trend towards zero with intentional conservatism. Progress is or collapse is all the universe affords. It doesn't seem interested in conservatism at all.

ohNoe5 2 hours ago | parent [-]

Oops I meant "without intentional conservatism"

And

"Progress or..." not "is or"

hbarka 5 hours ago | parent | prev | next [-]

You still have to worry about handing off state into the next session, but you don’t want it loading (“just naive-read the files”) your stack of documents at every turn . It goes against the idea of progressive disclosure. Progressive disclosure scales.

hellohello2 7 hours ago | parent | prev | next [-]

I can't see any value in having a global memory either, but can see the value for a local memory of a specific line of work. I.e. when implementing several features in a row that are related, you want the agent to remember what it did in the last chat.

giancarlostoro 7 hours ago | parent | prev | next [-]

I prefer ticketing systems for AI. I dont care that it forgets what I did last week, I just need it to be able to compact its own memory and grab the next task once done.

SwellJoe 7 hours ago | parent [-]

I'm ambivalent about that. I've seen people use beads, and they're just making busy work for the agents, splitting stuff up into tiny tasks that could have been one-shotted as part of the larger plan. They seem to just enjoy making thinky machine go brrr, even when it makes the work take longer and burn a lot more tokens.

I tend to think developing with agents should look at lot like managing a human (like, I use feature-branch development with PRs and review them, even on my own projects that have no other devs and don't need a paper trail for security audit purposes), so I theoretically can get down with an issue based process, but thus far I haven't seen it done in a way that isn't just making busy work for agents.

giancarlostoro 5 hours ago | parent [-]

I started with Beads, then wound up building my own:

https://github.com/Giancarlos/guardrails

Key things: I added a concept called "gates" which are tied to all tasks, it forces the agent to do arbitrary requirements such as: ensure it still runs / compiles, run all tests, ensure they pass, review existing tests critically and point out if they're not comprehensive enough, and finally, get human confirmation on the task. Until the human confirms, just work on another task and so on.

I didn't like that Beads was built on top of Git, I don't always work on git friendly projects, and beads kept getting messed up if I switched branches. So I made mine SQLite based. I also made it so you can sync to github issues, and sync pre-existing (and new) github issues as guardrails tasks to be worked on, the agent will even leave a comment for you on github when it grabs an issue in order to let others know the work will be done potentially.

waterproof 3 hours ago | parent [-]

nice concept! Beads did not age all that well, and Claude doesn't really want to use it since the TodoList upgrade.

Do you have any tricks for getting Claude to use guardrails effectively alongside (or instead of) TodoList?

giancarlostoro an hour ago | parent [-]

It works hand in hand to be honest, because Claude will read tickets that match criteria of what I'm looking to work on, and tack them on to its todo list, it just becomes and overview of my tasks.

mrits 7 hours ago | parent | prev | next [-]

I'm just thinking of youtube or amazon type algorithms applying here.

me: "Hi AI, can you debug this SQL Statement?"

ai: "Well,based on your passion for garden hoses and extensive research of refrigerators, I'm going to guess you really want to discuss that"

staticassertion 6 hours ago | parent [-]

I've had to remove any of the "knowledge" about me from any agent I use. "As a security engineer, blah blah blah" or "as a rust developer blah blah blah" even though my questions has nothing to do with those topics and they're a huge distraction.

SwellJoe 6 hours ago | parent [-]

Yeah, I've disabled memory in everything I use. It's super distracting to have it infer connections between conversations where there is none. It's also kind of sleazy feeling. Like, manipulative in the sense that it thinks it knows what I'm into so it's going to weave that into the conversation.

If we didn't have evidence that these things cause something like psychosis in some people, it'd seem innocent. But, since the sycophancy combines with the long-term relationships some people think they're having with matrix math to trigger serious mental health problems, it feels more sinister.

Anyway, having a long-term memory makes them dumber and more easily confused. I don't have any use for a dumb agent.

SachitRafa an hour ago | parent | prev [-]

[dead]