Remix.run Logo
armada1122 4 hours ago

The diff-as-review point is the one I keep coming back to.

The cost of memory-as-files isn't writing them. It's that the agent will cheerfully claim it updated something and not actually do it, or write a one-line stub that satisfies the spec but loses the original signal. Without a verification layer, the vault accumulates plausible-looking entries that quietly drift from reality.

What ended up working for me was treating the agent's self-reported summary as a wish, not a fact. A separate process diffs the actual file system against the claimed changes and flags mismatches.

After a few cycles, the agent gets calibrated and stops claiming things that don't survive a file check. That has the side benefit of making the diff review itself much higher signal: most of what shows up is real.

The split I'd make early is per-agent instructions vs. cross-thread shared notes.

They sound like the same artifact, but “what this agent should always do” and “what sibling work just learned” age very differently. Mixing them means the wisdom gets stale together.