Remix.run Logo
hadlock 4 days ago

There's a performance plateau with training time and number of parameters and then once you get over "the hump" error rate starts going down again almost linearly. GPT existed before OpenAI but it was theorized that the plateau was a dead end. The sell to VCs in the early gpt3 era was "with enough compute, enough time, and enough parameters... it'll probably just start thinking and then we have AGI". Sometime around the o3 era they realized they'd hit a wall and performance actually started to decrease as they added more parameters and time. But yeah basically at the time they needed money for more compute parameters and time. I would have loved to have been a fly on the wall in those "AGI" pitches. Don't forget Microsoft's agreement with OpenAI specifically concludes with the invention of AGI. at the time getting over the hump it really did look like we were gonna do AGI in a few months.

I'm really looking forward to "the social network" treatment movie about OpenAI whenever that happens

whimsicalism 4 days ago | parent [-]

source? i work in this field and have never heard of the initial plateau you are referring

reasonableklout 2 days ago | parent [-]

Maybe hadlock is thinking of double descent? https://openai.com/index/deep-double-descent/