▲ | bsamuels a day ago | |
as soon as you publish a benchmark like this, it becomes worthless because it can be included in the training corpus | ||
▲ | rbjorklin a day ago | parent [-] | |
While I agree with you in principle give Claude 4 a try on something like: https://open.kattis.com/problems/low . I would expect this to have been included in the training material as well as solutions found on Github. I've tried providing the problem description and asking Claude Sonnet 4 to solve it and so far it hasn't been successful. |