that is for sure what everyone does. also they train on evals with the datasets that they would be bench against.