▲ | djoldman 5 days ago | |
See Figure 2. The solver/challenger is the GAN discriminator/generator. The challenger is trained to create difficult questions. The solver is trained to strengthen pathways that correctly solve the questions like so: > To guide the Challenger toward producing challenging yet solvable questions, we first define an uncertainty score. For a generated question x, we query the current Solver... The most frequent response is treated as the pseudo-label y˜(x), and we compute the Solver’s empirical accuracy....The uncertainty reward is then defined.... This function incentivizes questions where the Solver is maximally uncertain (accuracy approaches 50%) Identifying the best pseudo-label seems like it would be the limitation of the approach. | ||
▲ | frumiousirc 4 days ago | parent [-] | |
> Identifying the best pseudo-label seems like it would be the limitation of the approach. Yes, I think this says in a different way what I'm trying to express. In GAN, the Discriminator pegs the training to some chosen reality (assuming the "real" data set is truly real). In Challenger/Solver alone, there is no peg. The Solver could hallucinate consistently and "win" the race. It's the consistency that is the goal. With GPT-4o as an arbiter of the Challenger/Solver training it provides the reality peg (or rather, the peg that biases toward GPT-4o's training set). |