Excellent paper. I like how much explanation had to be about the rationale of the judges, given the consistency of the LLM responses.