Remix clone Hacker News

new | show | ask | jobs Github

	▲	mr_00ff00 an hour ago
		What is a pre-training run?
	▲	nodja 31 minutes ago \| parent \| next [-]
		Pre-training is just training, it got the name because most models have a post-training stage so to differentiate people call it pre-training. Pre-training: You train on a vast amount of data, as varied and high quality as possible, this will determine the distribution the model can operate with, so LLMs are usually trained on a curated dataset of the whole internet, the output of the pre-training is usually called the base model. Post-training: You narrow down the task by training on the specific model needs you want. You can do this through several ways: - Supervised Finetuning (SFT): Training on a strict high quality dataset of the task you want. For example if you wanted a summarization model, you'd finetune the model on high quality text->summary pairs and the model would be able to summarize much better than the base model. - Reinforcement Learning (RL): You train a separate model that ranks outputs, then use it to rate the output of the model, then use that data to train the model. - Direct Preference Optimizaton (DPO): You have pairs of good/bad generations and use them to align the model towards/away the kinds of responses you want. Post-training is what makes the models able to be easily used, the most common is instruction tuning that teaches to model to talk in turns, but post-training can be used for anything. E.g. if you want a translation model that always translates a certain way, or a model that knows how to use tools, etc. you'd achieve all that through post-training. Post-training is where most of the secret sauce in current models is nowadays.
	▲	tim333 10 minutes ago \| parent \| prev \| next [-]
		If you've an hour to spare this Karpathy video is good at explaining how it all works https://youtu.be/7xTGNNLPyMI
	▲	abixb an hour ago \| parent \| prev \| next [-]
		The first step in building a large language model. That's when the model is initiated and trained on a huge dataset to learn patterns and whatnot. The "P" in "GPT" stands for "pre-trained."
	▲	bckr an hour ago \| parent \| prev [-]
		That’s where they take their big pile of data and train the model to do next-token-prediction.