Remix.run Logo
keskival 3 days ago

"I’m not sure, because OpenAI doesn’t deign to share gpt-4-base, nor to allow queries of gpt-4o in completion mode."

I would guess GPT-4o isn't first pre-trained and then instruct-tuned, but trained directly with refined instruction-following material.

This material probably contains way fewer chess games.

toxik 3 days ago | parent [-]

Why do you think that? InstructGPT was predominantly trained as a next-token predictor on whatever soup of data OpenAI curated at the time. The alignment signal (both RL part and the supervised prompt/answer pairs) are a tiny bit of the gradient.