Remix clone Hacker News

new | show | ask | jobs Github

	▲	quinncom 2 hours ago
		Don’t presume this study has anything to do with programming. They measured an agent’s ability to search long conversations, not code. > We evaluate on a 116-question representative subset of the LongMemEval benchmark (Wu et al., 2025), which tests an agent’s ability to answer questions over long conversations spanning multiple sessions.
	▲	schipperai 21 minutes ago \| parent [-]
		I get a sense that I was click-baited by article's title with the classic trope of "X is all you need". This research is a solid contribution, but is far from all we need to understand grep vs semantic search in agent retrieval.