| ▲ | jfim 5 hours ago | |||||||||||||||||||||||||||||||||||||
Indeed. It's pretty interesting to realize after implementing GPT-2 that the frontier models are scaled up versions of that, with various tweaks to improve performance, model-wise. The secret sauce though is all the datasets, RL training, knowledge of what works from doing all kinds of ablation experiments, and a massive compute moat. | ||||||||||||||||||||||||||||||||||||||
| ▲ | gobdovan 3 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
The secret sauce is also having the necessary 'creativity' to not get ceased and desisted into oblivion and jail from all the copyrighted material you trained your model on. Btw, not making a moral judgement, [0] shows Michael and Dalton from YC discussing why Ilya Sutskever had to leave Google to pursue what's now ChatGPT | ||||||||||||||||||||||||||||||||||||||
| ▲ | achrono 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
How do we know that today's frontier models are merely scaled up versions of that? Genuine question, since the labs have narrowed what they share over the years to now almost nothing, in terms of how the model was trained and how it works under the hood. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||