Remix clone Hacker News

new | show | ask | jobs Github

	▲	renewiltord 3 hours ago
		I’m sure you’ve tried all this but you’ve tried inter-rater agreement via multiple attempts on same LLM vs different LLM? Perhaps your system would work better if you ran it through 5 models 3 times and then highlighted diffs for human chooser.