▲ | rafram a day ago | |
Aider uses a dataset of 500 GitHub issues, so not LeetCode-style work. | ||
▲ | evantbyrne a day ago | parent [-] | |
It says right on that linked page: > Aider’s polyglot benchmark tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust. I looked up Exercism and they appear to be story problems that you solve by coding on mostly/entirely blank slates, unless I'm missing something? That format would seem to explain why the models are reportedly performing so well, because they definitely aren't that reliable on mature codebases. |