Remix.run Logo
quinncom 2 hours ago

Don’t presume this study has anything to do with programming. They measured an agent’s ability to search long conversations, not code.

> We evaluate on a 116-question representative subset of the LongMemEval benchmark (Wu et al., 2025), which tests an agent’s ability to answer questions over long conversations spanning multiple sessions.

schipperai 21 minutes ago | parent [-]

I get a sense that I was click-baited by article's title with the classic trope of "X is all you need". This research is a solid contribution, but is far from all we need to understand grep vs semantic search in agent retrieval.