| ▲ | Chasing AI Memory SOTA: Beating the Benchmark, Missing the Point(xmemory.ai) | |
| 4 points by alex_petrov 5 hours ago | 2 comments | ||
| ▲ | alex_petrov 5 hours ago | parent | next [-] | |
66.88%. 80.1%. 85%. 90.79%. 93%. 100%. These are all SOTA scores on agentic memory benchmarks. None of them tell you whether the system will work in production. The deeper problem isn't the data — it's that we often misunderstand what these numbers actually measure. In our recent white paper we open-sourced datasets that target specific memory functions. Today we published a follow-up that explains why we think the well-known agentic memory benchmarks (LoCoMo, LongMemEval) miss the mark for production systems, and what we measure instead. We're in a field that is measuring itself against itself. The real question isn't 'are we beating last week's leaderboard?' — it's 'are we building something that makes people's work meaningfully better?' That's harder to measure. It's also the only thing that matters. | ||
| ▲ | norikaoda 5 hours ago | parent | prev [-] | |
[flagged] | ||