▲ | anotherpaulg 6 days ago | |||||||
Aider author here. Based on some DMs with the Gemini team, they weren't aware that aider supports a "diff-fenced" edit format. And that it is specifically tuned to work well with Gemini models. So they didn't think to try it when they ran the aider benchmarks internally. Beyond that, I spend significant energy tuning aider to work well with top models. That is in fact the entire reason for aider's benchmark suite: to quantitatively measure and improve how well aider works with LLMs. Aider makes various adjustments to how it prompts and interacts with most every top model, to provide the very best possible AI coding results. | ||||||||
▲ | BonoboIO 6 days ago | parent | next [-] | |||||||
Thank you for providing such amazing tools for us. Aider is a godsend, when working with large codebase to get an overview. | ||||||||
▲ | modeless 6 days ago | parent | prev [-] | |||||||
Thanks, that's interesting info. It seems to me that such tuning, while making Aider more useful, and making the benchmark useful in the specific context of deciding which model to use in Aider itself, reduces the value of the benchmark in evaluating overall model quality for use in other tools or contexts, as people use it for today. Models that get more tuning will outperform models that get less tuning, and existing models will have an advantage over new ones by virtue of already being tuned. | ||||||||
|