| ▲ | great_psy 9 hours ago | |
How do you measure quality at scale ? Is there another model that determines if it adheres to codebase standard ? | ||
| ▲ | swyx 9 hours ago | parent [-] | |
see Beyond Unit Tests and Novel Grading Methods in TFA. i think something like ~60% llm as judge rubrics and the rest as described. every rubric validated by maintainer. 3000 rubrics | ||