| ▲ | tartakovsky 8 hours ago | ||||||||||||||||||||||||||||||||||
Well, task == Resolving real GitHub Issues Languages == Python only Libraries (um looks like other LLM generated libraries -- I mean definitely not pure human: like Ragas, FastMCP, etc) So seems like a highly skewed sample and who knows what can / can't be generalized. Does make for a compelling research paper though! | |||||||||||||||||||||||||||||||||||
| ▲ | nielstron 7 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||
Hey, paper author here. We did try to get an even sample - we include both SWE-bench repos (which are large, popular and mostly human-written) and a sample of smaller, more recent repositories with existing AGENTS.md (these tend to contain LLM written code of course). Our findings generalize across both these samples. What is arguably missing are small repositories of completely human-written code, but this is quite difficult to obtain nowadays. | |||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||
| ▲ | bootsmann 5 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||
> Libraries (um looks like other LLM generated libraries -- I mean definitely not pure human: like Ragas, FastMCP, etc) How does this invalidate the result? Aren't AGENTS.md files put exactly into those repos that are partly generated using LLMs? | |||||||||||||||||||||||||||||||||||
| ▲ | locknitpicker 6 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||
I think that is a rather fitting approach to the problem domain. A task being a real GitHub issue is a solid definition by any measure, and I see no problem picking language A over B or C. If you feel strongly about the topic, you are free to write your own article. | |||||||||||||||||||||||||||||||||||