Remix clone Hacker News

I'll look at it when this shows up on https://aider.chat/docs/leaderboards/ I feel like keeping up with all the models is a full time job so I just use this instead and hopefully get 90% of the benefit I would by manually testing out every model.

▲

evantbyrne a day ago | parent | next [-]

Are these just leetcode exercises? What I would like to see is an independent benchmark based on real tasks in codebases of varying size.

▲

rafram a day ago | parent | next [-]

Aider uses a dataset of 500 GitHub issues, so not LeetCode-style work.

	▲	evantbyrne a day ago \| parent [-]
		It says right on that linked page: > Aider’s polyglot benchmark tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust. I looked up Exercism and they appear to be story problems that you solve by coding on mostly/entirely blank slates, unless I'm missing something? That format would seem to explain why the models are reportedly performing so well, because they definitely aren't that reliable on mature codebases.

▲

KaoruAoiShiho a day ago | parent | prev [-]

Aider is not just leetcode exercises I think? livecodebench is leetcode exercises though.

▲

a day ago | parent | prev [-]

[deleted]