Remix clone Hacker News

new | show | ask | jobs Github

	▲	great_psy 9 hours ago
		How do you measure quality at scale ? Is there another model that determines if it adheres to codebase standard ?
	▲	swyx 9 hours ago \| parent [-]
		see Beyond Unit Tests and Novel Grading Methods in TFA. i think something like ~60% llm as judge rubrics and the rest as described. every rubric validated by maintainer. 3000 rubrics