Remix clone Hacker News

new | show | ask | jobs Github

	▲	maxrmk 3 days ago
		Very cool. Do you do anything to mitigate ordering bias in the evaluation function, or do you just expect it to average out over time?
	▲	kcorbitt 3 days ago \| parent [-]
		No, we don't do anything. Theoretically we could judge several times with different ordering. We could measure order bias really easily though; we just need to look at the average score by rollout position across many runs. I'll add that to my list of experiments!