▲ | PrimordialEdg71 11 hours ago | |
LLMs make impressive graders-of-convenience, but their judgments swing wildly with prompt phrasing and option order. Treat them like noisy crowd-raters: randomize inputs, ensemble outputs, and keep a human in the loop whenever single-digit accuracy points matter. |