▲ | nimitkalra 12 hours ago | |
There are technical quirks that make LLM judges particularly high variance, sensitive to artifacts in the prompt, and positively/negatively-skewed, as opposed to the subjectivity of human judges. These largely arise from their training distribution and post-training, and can be contained with careful calibration. |