Given that 4.7 was a brand new model, trained from scratch with a unique architecture and tokenization scheme, I don't see the same pattern. It seems arbitrary.

▲

dominotw 5 hours ago | parent [-]

i dont understand the nuances here. what does this mean. 4.8 is trained on same model as previous one then? what does brand new mean.

▲

irthomasthomas 5 hours ago | parent [-]

It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer. Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8.

	▲	dominotw 2 hours ago \| parent \| next [-]
		do you mean pre training? so 4.8 is just post training of an old pretrained model? btw where do they tell you how they trained the model.
	▲	3 hours ago \| parent \| prev [-]
		[deleted]