Remix clone Hacker News

new | show | ask | jobs Github

	▲	irthomasthomas 5 hours ago
		It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer. Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8.
	▲	dominotw 3 hours ago \| parent \| next [-]
		do you mean pre training? so 4.8 is just post training of an old pretrained model? btw where do they tell you how they trained the model.
	▲	3 hours ago \| parent \| prev [-]
		[deleted]