| ▲ | quinncom 2 hours ago | |
Don’t presume this study has anything to do with programming. They measured an agent’s ability to search long conversations, not code. > We evaluate on a 116-question representative subset of the LongMemEval benchmark (Wu et al., 2025), which tests an agent’s ability to answer questions over long conversations spanning multiple sessions. | ||
| ▲ | schipperai 21 minutes ago | parent [-] | |
I get a sense that I was click-baited by article's title with the classic trope of "X is all you need". This research is a solid contribution, but is far from all we need to understand grep vs semantic search in agent retrieval. | ||