| ▲ | joshstrange 10 hours ago | |
I’m pretty divided on “memory”. There are times it can feel almost magical but more often than not I feel like I am fighting with the steering wheel. Whenever I’m in a conversation and it references something unrelated (or even related) I get the “ick”. I know how context poisoning (intentional or not) works and I work hard to only expose things to the model that I want it to consider. There have been many times that I’ve started a fresh chat as to not being along the baggage (or wrong turns) of a previous chat but then it will say “And this should work great for <thing I never mentioned in THIS chat>” and at that moment my spidey-sense tingles and I start wondering “Crap, did it come to the conclusion it did based mostly/only on the new context or did it “take a shortcut” and use context from another chat? Like I said, I go out of my way to not “lead the witness” and so when the “witness” can peek at other conversations, all my caution is for naught. I encourage everyone to go read the saved memories in their LLM of choice, I’ve cleaned out complete crap from there multiple times. Actually wrong information, confusing information, or one-off things I don’t want influencing future discussions. The custom (or rather addition to the) system prompt is all I feel comfortable with. Where I give it some basic info about the coding language I prefer and the OSes that I’m often working with so that I don’t have to constantly say “actually this is FreeBSD” or “please give that to me in JS/TS instead of Python”. The only thing that has, so far, kept me from turning off memory is that I’m always slightly cautious of going off the beaten path for something so new and moving so fast. I often want to have as close to the “stock” config since I know how testing/QA works at most places (the further off the beaten path you, the more likely you’ll run into bugs). Also so that I can experience when everyone else is experiencing (within reason). Lastly, because, especially with LLMs, I feel like the people that over customize end up with a fragile systems. I think that a decent portion of the “N+1 model is dumber” or “X model has really gone downhill” is partially due to complicated configs (system prompts, MCP, etc) that might have helped at some point (dumber model, less capability) but are a hindrance to newer models. That or they never worked and someone just kept piling on more and more thinking it would help. | ||
| ▲ | rudedogg 6 hours ago | parent [-] | |
I've been thinking this too. I frequently do deep research on some systems programming technique, ask it to generate a .md for it, and then I use that in later sessions with Claude Code "look at the research I collected in {*-research}.md and help me explore ways to apply it to {thing}". At the research step it frequently (always?) uses memory to direct/scope the research to what I typically work on, but I think that kind of pigeon holes the model and what it explores. And the memory doesn't quite capture all the areas I'm interested in, or want to directly apply the research to. And regarding the crap in memories, I found the same. Mine at work mentioned I'm an expert at a business domain I have almost zero experience with. I feel like the companies building this stuff accept a lot of "slop" in their approach, and just can't see past building things by slopping stuff into prompts. I wish they'd explore more rigid approaches. Yes, I understand "the bitter lesson" but it seems obvious to me some traditional approaches would yield better results for the foreseeable future. Less magic (which is just running things through the cheapest model they have and dumping it in every chat). It seems like poison. Related: https://vercel.com/blog/agents-md-outperforms-skills-in-our-... Also, agent skills are usually pure slop. If you look through https://skills.sh on a framework/topic you're knowledgeable in you'll be a bit disheartened. This stuff was pioneered by people who move fast, but I think it's now time to try and push for quality and care in the approach since these have gotten good enough to contribute to more than prototype work. | ||