And this was just about how to decide an auto accident case. With the experiment varying the circumstances.
My summary is still: seasoned judges disagree with LLM output 50% of the time.