| ▲ | deaux 5 hours ago | ||||||||||||||||||||||||||||||||||||||||
I read the study. I think it does the opposite of what the authors suggest - it's actually vouching for good AGENTS.md files. > Surprisingly, we observe that developer-provided files only marginally improve performance compared to omitting them entirely (an increase of 4% on average), while LLM- generated context files have a small negative effect on agent performance (a decrease of 3% on average). This "surprisingly", and the framing seems misplaced. For the developer-made ones: 4% improvement is massive! 4% improvement from a simple markdown file means it's a must-have. > while LLM- generated context files have a small negative effect on agent performance (a decrease of 3% on average) This should really be "while the prompts used to generate AGENTS files in our dataset..". It's a proxy for prompts, who knows if the ones generated through a better prompt show improvement. The biggest usecase for AGENTS.md files is domain knowledge that the model is not aware of and cannot instantly infer from the project. That is gained slowly over time from seeing the agents struggle due to this deficiency. Exactly the kind of thing very common in closed-source, yet incredibly rare in public Github projects that have an AGENTS.md file - the huge majority of which are recent small vibecoded projects centered around LLMs. If 4% gains are seen on the latter kind of project, which will have a very mixed quality of AGENTS files in the first place, then for bigger projects with high-quality .md's they're invaluable when working with agents. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | nielstron 4 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||
Hey thanks for your review, a paper author here. Regarding the 4% improvement for human written AGENTS.md: this would be huge indeed if it were a _consistent_ improvement. However, for example on Sonnet 4.5, performance _drops_ by over 2%. Qwen3 benefits most and GPT-5.2 improves by 1-2%. The LLM-generated prompts follow the coding agent recommendations. We also show an ablation over different prompt types, and none have consistently better performance. But ultimately I agree with your post. In fact we do recommend writing good AGENTS.md, manually and targetedly. This is emphasized for example at the end of our abstract and conclusion. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | giancarlostoro an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
> The biggest usecase for AGENTS.md files is domain knowledge that the model is not aware of and cannot instantly infer from the project. That is gained slowly over time from seeing the agents struggle due to this deficiency. This. I have Claude write about the codebase because I get tired of it grepping files constantly. I rather it just know “these files are for x, these files have y methods” and I even have it breakdown larger files so it fits the entire context window several times over. Funnily enough this makes it easier for humans to parse. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | SerCe 2 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
In Theory There Is No Difference Between Theory and Practice, While In Practice There Is. In large projects, having a specific AGENTS.md makes the difference between the agent spending half of its context window searching for the right commands, navigating the repo, understanding what is what, etc., and being extremely useful. The larger the repository, the more things it needs to be aware of and the more important the AGENTS.md is. At least that's what I have observed in practice. | |||||||||||||||||||||||||||||||||||||||||
| ▲ | bootsmann 5 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
This reads a lot like bargaining stage. If agentic AI makes me a 10 times more productive developer, surely a 4% improvement is barely worth the token cost. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | zero_k 4 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||
Honestly, the more research papers I read, the more I am suspicious. This "surprisingly" and other hyperbole is just to make reviewers think the authors actually did something interesting/exciting. But the more "surprises" there are in a paper, the more I am suspicious of it. Often such hyperbole ought to be at best ignored, at worst the exact opposite needs to be examined. It seems like the best students/people eventually end up doing CS research in their spare time while working as engineers. This is not the case for many other disciplines, where you need e.g. a lab to do research. But in CS, you can just do it from your basement, all you need is a laptop. | |||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||
| ▲ | pgt 2 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||
4% is yuuuge. In hard projects, 1% is the difference between getting it right with an elegant design or going completely off the rails. | |||||||||||||||||||||||||||||||||||||||||