Remix.run Logo
Confidence estimation is a better metric than agreement for LLM judges(arxiv.org)
3 points by rapiddev 8 hours ago