▲ | gallerdude 3 days ago | ||||||||||||||||
For coding, I like the Aider polyglot benchmark, since it covers multiple programming languages. Gemini 2.5 Pro got 72.9% o3 high gets 81.3%, o4-mini high gets 68.9% | |||||||||||||||||
▲ | croemer 3 days ago | parent | next [-] | ||||||||||||||||
Isn't it easy to train on the specific Exercism exercises that this benchmark uses? | |||||||||||||||||
▲ | vessenes 3 days ago | parent | prev | next [-] | ||||||||||||||||
where do you find those o3 high numbers? https://aider.chat/docs/leaderboards/ currently has gemini 2.5 pro as the leader at, as you say, 72.9%. | |||||||||||||||||
| |||||||||||||||||
▲ | jumpCastle 3 days ago | parent | prev | next [-] | ||||||||||||||||
It was a good benchmark until it entered the training set. | |||||||||||||||||
▲ | asadm 3 days ago | parent | prev [-] | ||||||||||||||||
thanks |