Remix.run Logo
mr_00ff00 an hour ago

What is a pre-training run?

nodja 31 minutes ago | parent | next [-]

Pre-training is just training, it got the name because most models have a post-training stage so to differentiate people call it pre-training.

Pre-training: You train on a vast amount of data, as varied and high quality as possible, this will determine the distribution the model can operate with, so LLMs are usually trained on a curated dataset of the whole internet, the output of the pre-training is usually called the base model.

Post-training: You narrow down the task by training on the specific model needs you want. You can do this through several ways:

- Supervised Finetuning (SFT): Training on a strict high quality dataset of the task you want. For example if you wanted a summarization model, you'd finetune the model on high quality text->summary pairs and the model would be able to summarize much better than the base model.

- Reinforcement Learning (RL): You train a separate model that ranks outputs, then use it to rate the output of the model, then use that data to train the model.

- Direct Preference Optimizaton (DPO): You have pairs of good/bad generations and use them to align the model towards/away the kinds of responses you want.

Post-training is what makes the models able to be easily used, the most common is instruction tuning that teaches to model to talk in turns, but post-training can be used for anything. E.g. if you want a translation model that always translates a certain way, or a model that knows how to use tools, etc. you'd achieve all that through post-training. Post-training is where most of the secret sauce in current models is nowadays.

tim333 10 minutes ago | parent | prev | next [-]

If you've an hour to spare this Karpathy video is good at explaining how it all works https://youtu.be/7xTGNNLPyMI

abixb an hour ago | parent | prev | next [-]

The first step in building a large language model. That's when the model is initiated and trained on a huge dataset to learn patterns and whatnot. The "P" in "GPT" stands for "pre-trained."

bckr an hour ago | parent | prev [-]

That’s where they take their big pile of data and train the model to do next-token-prediction.