| ▲ | mikeayles 12 hours ago | |
I benchmarked Claude Code and GitHub Copilot on the same model (Haiku 4.5) with and without RAG-powered semantic search across 60 queries on a real codebase. RAG didn't make search more accurate on Claude Code, but it cut token consumption by 28%. On Copilot, it cut time to resolution by 44% and improved F1 by 19.5%. The bigger finding: controlling for model, tool design alone accounts for a 30pp recall gap between the two tools. Benchmark code and data are open source. | ||