Simon, is your pelican test really captures differences among models or should you at least try like 10 times or something to average the random effects

▲

simonw 6 hours ago | parent [-]

I've been meaning to do a "run 3 times and pick the best" version for quite a while, I should really pull the trigger on that one. Currently it's one-shot only.

▲

xiphias2 5 hours ago | parent [-]

Best-of-3 would be cheating, ruin the test, middle of 3 makes more sense

▲

nik736 5 hours ago | parent [-]

Why would you need the 3rd run if you pick the "one in the middle"?

	▲	jmaw 3 hours ago \| parent [-]
		Middle as in not the best, and not the worst. As opposed to the second generated in sequence. But not the best/not the worst is somewhat subjective.. so not sure how well that would work.