I'd say give it some time for the dust to settle. This field badly needs standardized benchmarks even before the conversation around model goodness can start.