▲ | Gracana 3 days ago | |
They probably don’t do it at a scale large enough to do RLHF with it, but it’s still useful feedback the people working on the projects / products. | ||
▲ | zozbot234 3 days ago | parent [-] | |
More recent models actually use "reinforcement learning from AI feedback", where the task of assigning a reward is essentially fed back into the model itself. Human feedback is then only used to ground the training, on selected examples (potentially even entirely artificial ones) where the AI is most highly uncertain about what feedback should be given. |