Remix clone Hacker News

	▲	PrimordialEdg71 11 hours ago
		LLMs make impressive graders-of-convenience, but their judgments swing wildly with prompt phrasing and option order. Treat them like noisy crowd-raters: randomize inputs, ensemble outputs, and keep a human in the loop whenever single-digit accuracy points matter.