Remix clone Hacker News

I just finished updating the aider polyglot leaderboard [0] with GPT-4.1, mini and nano. My results basically agree with OpenAI's published numbers.

Results, with other models for comparison:

    Model                       Score   Cost

    Gemini 2.5 Pro Preview 03-25 72.9%  $ 6.32
    claude-3-7-sonnet-20250219   64.9%  $36.83
    o3-mini (high)               60.4%  $18.16
    Grok 3 Beta                  53.3%  $11.03
  * gpt-4.1                      52.4%  $ 9.86
    Grok 3 Mini Beta (high)      49.3%  $ 0.73
  * gpt-4.1-mini                 32.4%  $ 1.99
    gpt-4o-2024-11-20            18.2%  $ 6.74
  * gpt-4.1-nano                  8.9%  $ 0.43

Aider v0.82.0 is also out with support for these new models [1]. Aider wrote 92% of the code in this release, a tie with v0.78.0 from 3 weeks ago.

[0] https://aider.chat/docs/leaderboards/

[1] https://aider.chat/HISTORY.html

▲ pzo 6 days ago | parent | next [-]

Did you benchmarked combo: DeepSeek R1 + DeepSeek V3 (0324)? There is combo on 3rd place : DeepSeek R1 + claude-3-5-sonnet-20241022 and also V3 new beating claude 3.5 so in theory R1 + V3 should be even on 2nd place. Just curious if that would be the case

▲ purplerabbit 6 days ago | parent | prev [-]

What model are you personally using in your aider coding? :)

	▲	anotherpaulg 6 days ago \| parent [-]
		Mostly Gemini 2.5 Pro lately. I get asked this often enough that I have a FAQ entry with automatically updating statistics [0]. `Model Tokens Pct Gemini 2.5 Pro 4,027,983 88.1% Sonnet 3.7 518,708 11.3% gpt-4.1-mini 11,775 0.3% gpt-4.1 10,687 0.2%` [0] https://aider.chat/docs/faq.html#what-llms-do-you-use-to-bui...