I'm not sure how many HN users frequent other places related to agentic coding like the subreddits of particular providers, but this has got to be the 1000th "ultimate memory system"/break-free-of-the-context-limit-tyranny! project I've seen, and like all other similar projects there's never any evidence or even attempt at measuring any metric of performance improved by it. Of course it's hard to measure such a thing, but that's part of exactly why it's hard to build something like this. Here's user #1001 that's been told by Claude "What a fascinating idea! You've identified a real gap in the market for a simple database based memory system to extend agent memory."

▲

beefsack 4 hours ago | parent | next [-]

I feel like so many of these memory solutions are incredibly over-engineered too.

You can work around a lot of the memory issues for large and complex tasks just by making the agent keep work logs. Critical context to keep throughout large pieces of work include decisions, conversations, investigations, plans and implementations - a normal developer should be tracking these and it's sensible to have the agent track them too in a way that survives compaction.

	▲	SkyPuncher an hour ago \| parent \| next [-]
		Yep. I just have my agents write out key details to a markdown file. Doesn’t have to be perfect. Just enough to reorient itself to a problem.
	▲	ramoz 4 hours ago \| parent \| prev [-]
		Great advise. For large plans I tell the agent to write to an “implementation_log.md” and make note of it during compaction. Additionally the agent can also just reference the original session logs.

▲

stingraycharles 22 minutes ago | parent | prev | next [-]

imho, if it’s not based on a RAG, it’s not a real memory system. the agent often doesn’t know what it doesn’t know, and as such relevant memories must be pushed into the context window by embedding distance, not actively looked up.

▲

austinbaggio 6 hours ago | parent | prev | next [-]

Which of the 1000 is your favorite? There does seem to be a shallow race to optimizing xyz benchmark for some narrow sliver of the context problem, but you're right, context problem space is big, so I don't think we'll hurry to join that narrow race.

▲

gbnwl 6 hours ago | parent | next [-]

| Which of the 1000 is your favorite?

None, that's what I'm trying to say. My favorite is just storing project context locally in docs that agents can discover on their own or I can point to if needed. This doesn't require me to upload sensitive code or information to anonymous people's side projects and has and equivalent amount of hard evidence for efficacy (zero), but at least has my own anecdotal evidence of helping and doesn't invite additonal security risk.

People go way overboard with MCPs and armies of subagents built on wishes and unproven memory systems because no one really knows for sure how to get past the spot we all hit where the agentic project that was progressing perfectly hits a sharp downtrend in progress. Doesn't mean it's time to send our data to strangers.

	▲	gck1 an hour ago \| parent [-]
		> no one really knows for sure how to get past the spot we all hit where the agentic project that was progressing perfectly hits a sharp downtrend in progress. FWIW, I find this eventual degradation point comes much later and with fewer consequences when there are strict guardrails inside and outside of the LLM itself. From what I've seen, most people try to fix only the "inside" part - by tweaking the prompts, installing 500 MCPs (that ironically pollute the context and make problem worse), yell in uppercase in hopes that it will remember etc, and ignore that automated compliance checks existed way before LLMs. Throw the strictest and most masochistic linting rules at it in a language that is masochistic itself (e.g. rust), add tons of integration tests that encode intent, add a stop hook in CC that runs all these checks and you've got a system that is simply not allowed to silently drift and can put itself back on track with feedback it gets from it. Basically, rather than trying to hypnotize an agent to remember everything by writing a 5000 line agents.md, just let the code itself scream at it and feed the context.

▲

Davidzheng 4 hours ago | parent | prev [-]

Wait.

▲

AndyNemmity 6 hours ago | parent | prev | next [-]

The funny part is, the vast majority of them are barely doing anything at all.

All of these systems are for managing context.

You can generally tell which ones are actually doing something if they are using skills, with programs in them.

Because then, you're actually attaching some sort of feature to the system.

Otherwise, you're just feeding in different prompts and steps, which can add some value, but okay, it doesn't take much to do that.

Like adding image generation to claude code with google nano banana, a python script that does it.

That's actually adding something claude code doesn't have, instead of just saying "You are an expert in blah"

▲

austinbaggio 6 hours ago | parent [-]

It sounds like you've used quite a few. What programs are you expecting? Assuming you're talking about doing some inference on the data? Or optimizing for some RAG or something?

▲

AndyNemmity 6 hours ago | parent [-]

An example of a skill i gave, adding image generation to nano banana.

another is one claude code ships with, using rip grep.

Those are actual features. It's adding deterministic programs that the llm calls when it needs something.

▲

austinbaggio 5 hours ago | parent [-]

Oh got it - tool use

	▲	AndyNemmity 5 hours ago \| parent [-]
		Exactly. That adds actual value. Some of the 1000s of projects do this. Those pieces add value, if the tool adds value which also isn’t a given

▲

willtemperley 2 hours ago | parent | prev | next [-]

Replace "Automation" with "Agentic coding" here:

https://xkcd.com/1319/

▲

Forgeties79 6 hours ago | parent | prev | next [-]

Have you tried using it? Not being flippant and annoying. Just curious if you tried it and what the results were

▲

Game_Ender 5 hours ago | parent | next [-]

Why should he put effort into measuring a tool that the author has not? The point is there are so many of these tools an objective measure that the creators of these tools can compare against each other would be better.

So a better question to ask is - Do you have any ideas for an objective way to a measure a performance of agentic coding tools? So we can truly determine what improves performance or not.

I would hope that internal to OpenAI and Anthropic they use something similar to the harness/test cases they use for training their full models to determine if changes to claude code result in better performance.

▲

morkalork 3 hours ago | parent [-]

Well, if I were Microsoft and training co-pilot, I would log all the <restore checkpoint> user actions and grade the agents on that. At scale across all users, "resets per agent command" should be useful. But then again, publishing the true numbers might be embarrassing..

	▲	kuboble 33 minutes ago \| parent [-]
		I'm not sure it's a good signal. I often use restore conversion checkpoint after successfully completing a side quest.

▲

gbnwl 6 hours ago | parent | prev [-]

Who has time to try this when there's this huge backlog here: https://www.reddit.com/r/ClaudeAI/search/?q=memory

▲

Forgeties79 6 hours ago | parent | next [-]

Have you tried any of those?

▲

gbnwl 5 hours ago | parent [-]

Yes, they haven't helped. Have you found one that works for you?

▲

austinbaggio 5 hours ago | parent [-]

What are you both looking for? What is the problem you want solved?

▲

ggm 4 hours ago | parent [-]

Is a series of postings all in the form of questions an indication somebody hooked "eliza" up as an input device?

	▲	morkalork 3 hours ago \| parent [-]
		Nah, just another one of those spam bots on all the small-business, finance and tradies sub-reddits: "Hey fellow users, have you ever suffered from <use case>? What is the problem you want solved? Tell me your honest opinions below!"

▲

6 hours ago | parent | prev [-]

[deleted]

▲

johnnyfived 5 hours ago | parent | prev [-]

I imagine HN, despite being full of experts and vet devs, also might have a prevalent attitude of looking down on using tools like MCP servers or agentic AI libraries for coding, which might be why something like this advertised seems novel rather than redundant.

▲

DrewADesign 4 hours ago | parent [-]

> I imagine HN, despite being full of experts and vet devs, also might have a prevalent attitude of looking down on using tools like MCP servers or agentic AI libraries for coding, which might be why something like this advertised seems novel rather than redundant.

I’m not sure where the ‘despite’ comes in. Experts and vets have opinions and this is probably the best online forum to express them. Lots of experts and vets also dislike extremely popular unrelated tools like VB, Windows, “no-code” systems, and Google web search… it’s not a personality flaw. It doesn’t automatically mean they’re right, either, but ‘expert’ and ‘vet’ are earned statuses, and that means something. We’ve seen trends come and go and empires rise and fall, and been repeatedly showered in the related hype/PR/FUD. Not reflexively embracing everything that some critical mass of other people like is totally fine.

▲

gbnwl 4 hours ago | parent [-]

I think maybe the point they were trying to make is that despite people on HN being very technically experienced, skepticism and distrust of LLM-assisted coding tools may have prevented many of them from exploring the space too deeply yet. So a project like this may seem novel to many readers here, when the reality for users who've been using and following tools like Claude Code (and similar) closely for a while now is that claims like the one's this project is making come out multiple times per week.

	▲	johnnyfived 4 hours ago \| parent [-]
		They pretty much perfectly encapsulated the point in their fired up response haha.