▲ | lewdwig a day ago | ||||||||||||||||
Well-designed benchmarks have a public sample set and a private testing set. Models are free to train on the public set, but they can't game the benchmark or overfit the samples that way because they're only assessed on performance against examples they haven't seen. Not all benchmarks are well-designed. | |||||||||||||||||
▲ | thousand_nights a day ago | parent [-] | ||||||||||||||||
but as soon as you test on your private testing set you're sending it to their servers so they have access to it so effectively you can only guarantee a single use stays private | |||||||||||||||||
|