▲ | keskival 3 days ago | |
"I’m not sure, because OpenAI doesn’t deign to share gpt-4-base, nor to allow queries of gpt-4o in completion mode." I would guess GPT-4o isn't first pre-trained and then instruct-tuned, but trained directly with refined instruction-following material. This material probably contains way fewer chess games. | ||
▲ | toxik 3 days ago | parent [-] | |
Why do you think that? InstructGPT was predominantly trained as a next-token predictor on whatever soup of data OpenAI curated at the time. The alignment signal (both RL part and the supervised prompt/answer pairs) are a tiny bit of the gradient. |