Remix.run Logo
bsamuels a day ago

as soon as you publish a benchmark like this, it becomes worthless because it can be included in the training corpus

rbjorklin a day ago | parent [-]

While I agree with you in principle give Claude 4 a try on something like: https://open.kattis.com/problems/low . I would expect this to have been included in the training material as well as solutions found on Github. I've tried providing the problem description and asking Claude Sonnet 4 to solve it and so far it hasn't been successful.