| ▲ | energy123 7 hours ago | |||||||
Their definition of context excludes prescriptive specs/requirements files. They are only talking about a file that summarizes what exists in the codebase, which is information that's otherwise discoverable by the agent through CLI (ripgrep, etc), and it's been trained to do that as efficiently as possible. Also important to note that human-written context did help according to them, if only a little bit. Effectively what they're saying is that inputting an LLM generated summary of the codebase didn't help the agent. Which isn't that surprising. | ||||||||
| ▲ | MITSardine 5 hours ago | parent | next [-] | |||||||
I find it surprising. The piece of code I'm working on is about 10k LoC to define the basic structures and functionality and I found Claude Code would systematically spend significant time and tokens exploring it to add even basic functionality. Part of the issue is this deals with a problem domain LLMs don't seem to be very well trained on, so they have to take it all in, they don't seem to know what to look for in advance. I went through a couple of iterations of the CLAUDE.md file, first describing the problem domain and library intent (that helped target search better as it had keywords to go by; note a domain-trained human would know these in advance from the three words that comprise the library folder name) and finally adding a concise per-function doc of all the most frequently used bits. I find I can launch CC on a simple task now, without it spending minutes reading the codebase before getting started. | ||||||||
| ||||||||
| ▲ | nielstron 7 hours ago | parent | prev [-] | |||||||
Hey, a paper author here :) I agree, if you know well about LLMs it shouldn't be too surprising that autogenerated context files are not helping - yet this is the default recommendation by major AI companies which we wanted to scrutinize. > Their definition of context excludes prescriptive specs/requirements files. Can you explain a bit what you mean here? If the context file specifies a desired behavior, we do check whether the LLM follows it, and this seems generally to work (Section 4.3). | ||||||||