I don't really consider that a great benchmark anyway and we really need better ones that are objective instead of these mostly performative and cheatable and also available in the training set.