| ▲ | rmunn 5 hours ago | |
If I understand the paper correctly, the researchers found that AGENTS.md context files caused the LLMs to burn through more tokens as they parsed and followed the instructions, but they did not find a large change in the success rate (defined by "the PR passes the existing unit tests in the repo"). What wasn't measured, probably because it's almost impossible to quantify, was the quality of the code produced. Did the context files help the LLMs produce code that matched the style of the rest of the project? Did the code produced end up reasonably maintainable in the long run, or was it slop that increased long-term tech debt? These are important questions, but as they are extremely difficult to assign numbers to and measure in an automated way, the paper didn't attempt to answer them. | ||