▲ | throwawaymaths 5 days ago | |||||||
you're missing the point. SAT multiple choice negatives for random guesses, fine, you could trivially use this sort of a strategy for assigning cost functions to a classifier and backpropagate. how do you give negative weight to a wrong answer when training a transformer? | ||||||||
▲ | ACCount37 5 days ago | parent | next [-] | |||||||
In RLVR? Quite easily. And OpenAI has induced hallucinations in o3 with RLVR mistakes, not with a failed pre-training run. They used o4-mini as an example - similar training to o3 and similar issues. Conversely, they have also designed a post-training system that has successfully reduced hallucinations in GPT-5. | ||||||||
| ||||||||
▲ | RugnirViking 5 days ago | parent | prev [-] | |||||||
isn't this just related to the question "how do you train a transformer"? you give it wrong examples, and use optimization algorithms to move away from that kind of completions | ||||||||
|