▲ | khafra 4 days ago | ||||||||||||||||
https://openai.com/index/prover-verifier-games-improve-legib... OpenAI has been doing verifier-guided training since last year. No SOTA model was trained without verified reward training for math and programming. | |||||||||||||||||
▲ | troupo 4 days ago | parent [-] | ||||||||||||||||
Your claim: "by reading the docs, and by autogenerating code samples and testing them against verifiers, and by paying a lot of people to write sample code for sample questions." Your link: "Grade school math problems from a hardcoded dataset with hardcoded answers" [1] It really is the same thing. [1] https://openai.com/index/solving-math-word-problems/ --- start quote --- GSM8K consists of 8.5K high quality grade school math word problems. Each problem takes between 2 and 8 steps to solve, and solutions primarily involve performing a sequence of elementary calculations using basic arithmetic operations (+ − × ÷) to reach the final answer. --- end quote --- | |||||||||||||||||
|