| ▲ | dguest 17 hours ago | |
For better or worse the people working on this don't really use perplexity or accuracy to evaluate models. The target is whatever you'd get for those metrics if you used the discriminants that were provided in the dataset (i.e. the GN2v01 values). As for why accuracy and perplexity aren't reported: the experiments generally choose a threshold to consider something a "b-hadron" (basically picking a point along the ROC curve) and quantify the TPR and FPR at that point. There are reasons for this, mostly that picking a standard point lets them verify that the simulation actually reflects data. See, for example, the FPR [1] and TPR [2] "calibrations". It's a good point, though, the physicists should probably try harder to report standard metrics that the rest of the ML community uses. | ||